From patchwork Wed Apr 8 05:02:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479357 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 61D9F912 for ; Wed, 8 Apr 2020 05:06:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 36D5E20771 for ; Wed, 8 Apr 2020 05:06:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="p4M4l8mn" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726690AbgDHFFe (ORCPT ); Wed, 8 Apr 2020 01:05:34 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:52624 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726637AbgDHFFe (ORCPT ); Wed, 8 Apr 2020 01:05:34 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03854BNo012923; Wed, 8 Apr 2020 05:04:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=skSK5dc4HZd1ygvZWCZ0mvu5/IDpVFbzIohZL2fpedY=; b=p4M4l8mnkF/etgUggs6AvVrpqtHrBab+6kS2BRdp7uKb7mxGmfZxSjdhraVPNVIOBqW1 9SsaF2ylBowJOGLHnPtacyQU8Iu1M2C729hNOxufFVU93m1m+cPlYGbixXwwcAptu91a Dn7Ljb7dgX5WW5iaPA9CEvwfhOV/51yb4VF9MJYDR7G5sIw8yzijv+pZP4rdMgdHm+uE Ul8vqHxMUpyE+BWirc2G54VFigebcGKqdiqjiSB/OQraTshP5IDeg1J9KANEDdF7e95c rQTAkrvuRgoaIPmCw1NiQKZoAdBT2OYIgd96D8d+7Iya26gRJrIFVYatZXQuwbpqUWnH dA== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2130.oracle.com with ESMTP id 3091m390vy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:04:59 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03852glC148201; Wed, 8 Apr 2020 05:04:58 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3030.oracle.com with ESMTP id 3091kgj6ng-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:04:58 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03854tSP015092; Wed, 8 Apr 2020 05:04:56 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:04:55 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 01/26] x86/paravirt: Specify subsection in PVOP macros Date: Tue, 7 Apr 2020 22:02:58 -0700 Message-Id: <20200408050323.4237-2-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 bulkscore=0 suspectscore=0 spamscore=0 malwarescore=0 adultscore=0 phishscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 adultscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 priorityscore=1501 clxscore=1015 bulkscore=0 phishscore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Allow PVOP macros to specify a subsection such that _paravirt_alt() can optionally put sites in .parainstructions.*. Signed-off-by: Ankur Arora --- arch/x86/include/asm/paravirt_types.h | 158 +++++++++++++++++--------- 1 file changed, 102 insertions(+), 56 deletions(-) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 732f62e04ddb..37e8f27a3b9d 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -337,6 +337,9 @@ struct paravirt_patch_template { extern struct pv_info pv_info; extern struct paravirt_patch_template pv_ops; +/* Sub-section for .parainstructions */ +#define PV_SUFFIX "" + #define PARAVIRT_PATCH(x) \ (offsetof(struct paravirt_patch_template, x) / sizeof(void *)) @@ -350,9 +353,9 @@ extern struct paravirt_patch_template pv_ops; * Generate some code, and mark it as patchable by the * apply_paravirt() alternate instruction patcher. */ -#define _paravirt_alt(insn_string, type, clobber) \ +#define _paravirt_alt(sec, insn_string, type, clobber) \ "771:\n\t" insn_string "\n" "772:\n" \ - ".pushsection .parainstructions,\"a\"\n" \ + ".pushsection .parainstructions" sec ",\"a\"\n" \ _ASM_ALIGN "\n" \ _ASM_PTR " 771b\n" \ " .byte " type "\n" \ @@ -361,8 +364,9 @@ extern struct paravirt_patch_template pv_ops; ".popsection\n" /* Generate patchable code, with the default asm parameters. */ -#define paravirt_alt(insn_string) \ - _paravirt_alt(insn_string, "%c[paravirt_typenum]", "%c[paravirt_clobber]") +#define paravirt_alt(sec, insn_string) \ + _paravirt_alt(sec, insn_string, "%c[paravirt_typenum]", \ + "%c[paravirt_clobber]") /* Simple instruction patching code. */ #define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t" @@ -414,7 +418,7 @@ int paravirt_disable_iospace(void); * unfortunately, are quite a bit (r8 - r11) * * The call instruction itself is marked by placing its start address - * and size into the .parainstructions section, so that + * and size into the .parainstructions* sections, so that * apply_paravirt() in arch/i386/kernel/alternative.c can do the * appropriate patching under the control of the backend pv_init_ops * implementation. @@ -512,7 +516,7 @@ int paravirt_disable_iospace(void); }) -#define ____PVOP_CALL(rettype, op, clbr, call_clbr, extra_clbr, \ +#define ____PVOP_CALL(sec, rettype, op, clbr, call_clbr, extra_clbr, \ pre, post, ...) \ ({ \ rettype __ret; \ @@ -522,7 +526,7 @@ int paravirt_disable_iospace(void); /* since this condition will never hold */ \ if (sizeof(rettype) > sizeof(unsigned long)) { \ asm volatile(pre \ - paravirt_alt(PARAVIRT_CALL) \ + paravirt_alt(sec, PARAVIRT_CALL) \ post \ : call_clbr, ASM_CALL_CONSTRAINT \ : paravirt_type(op), \ @@ -532,7 +536,7 @@ int paravirt_disable_iospace(void); __ret = (rettype)((((u64)__edx) << 32) | __eax); \ } else { \ asm volatile(pre \ - paravirt_alt(PARAVIRT_CALL) \ + paravirt_alt(sec, PARAVIRT_CALL) \ post \ : call_clbr, ASM_CALL_CONSTRAINT \ : paravirt_type(op), \ @@ -544,22 +548,22 @@ int paravirt_disable_iospace(void); __ret; \ }) -#define __PVOP_CALL(rettype, op, pre, post, ...) \ - ____PVOP_CALL(rettype, op, CLBR_ANY, PVOP_CALL_CLOBBERS, \ +#define __PVOP_CALL(sec, rettype, op, pre, post, ...) \ + ____PVOP_CALL(sec, rettype, op, CLBR_ANY, PVOP_CALL_CLOBBERS, \ EXTRA_CLOBBERS, pre, post, ##__VA_ARGS__) -#define __PVOP_CALLEESAVE(rettype, op, pre, post, ...) \ - ____PVOP_CALL(rettype, op.func, CLBR_RET_REG, \ +#define __PVOP_CALLEESAVE(sec, rettype, op, pre, post, ...) \ + ____PVOP_CALL(sec, rettype, op.func, CLBR_RET_REG, \ PVOP_CALLEE_CLOBBERS, , \ pre, post, ##__VA_ARGS__) -#define ____PVOP_VCALL(op, clbr, call_clbr, extra_clbr, pre, post, ...) \ +#define ____PVOP_VCALL(sec, op, clbr, call_clbr, extra_clbr, pre, post, ...) \ ({ \ PVOP_VCALL_ARGS; \ PVOP_TEST_NULL(op); \ asm volatile(pre \ - paravirt_alt(PARAVIRT_CALL) \ + paravirt_alt(sec, PARAVIRT_CALL) \ post \ : call_clbr, ASM_CALL_CONSTRAINT \ : paravirt_type(op), \ @@ -568,85 +572,127 @@ int paravirt_disable_iospace(void); : "memory", "cc" extra_clbr); \ }) -#define __PVOP_VCALL(op, pre, post, ...) \ - ____PVOP_VCALL(op, CLBR_ANY, PVOP_VCALL_CLOBBERS, \ +#define __PVOP_VCALL(sec, op, pre, post, ...) \ + ____PVOP_VCALL(sec, op, CLBR_ANY, PVOP_VCALL_CLOBBERS, \ VEXTRA_CLOBBERS, \ pre, post, ##__VA_ARGS__) -#define __PVOP_VCALLEESAVE(op, pre, post, ...) \ - ____PVOP_VCALL(op.func, CLBR_RET_REG, \ +#define __PVOP_VCALLEESAVE(sec, op, pre, post, ...) \ + ____PVOP_VCALL(sec, op.func, CLBR_RET_REG, \ PVOP_VCALLEE_CLOBBERS, , \ pre, post, ##__VA_ARGS__) -#define PVOP_CALL0(rettype, op) \ - __PVOP_CALL(rettype, op, "", "") -#define PVOP_VCALL0(op) \ - __PVOP_VCALL(op, "", "") +#define _PVOP_CALL0(sec, rettype, op) \ + __PVOP_CALL(sec, rettype, op, "", "") +#define _PVOP_VCALL0(sec, op) \ + __PVOP_VCALL(sec, op, "", "") -#define PVOP_CALLEE0(rettype, op) \ - __PVOP_CALLEESAVE(rettype, op, "", "") -#define PVOP_VCALLEE0(op) \ - __PVOP_VCALLEESAVE(op, "", "") +#define _PVOP_CALLEE0(sec, rettype, op) \ + __PVOP_CALLEESAVE(sec, rettype, op, "", "") +#define _PVOP_VCALLEE0(sec, op) \ + __PVOP_VCALLEESAVE(sec, op, "", "") -#define PVOP_CALL1(rettype, op, arg1) \ - __PVOP_CALL(rettype, op, "", "", PVOP_CALL_ARG1(arg1)) -#define PVOP_VCALL1(op, arg1) \ - __PVOP_VCALL(op, "", "", PVOP_CALL_ARG1(arg1)) +#define _PVOP_CALL1(sec, rettype, op, arg1) \ + __PVOP_CALL(sec, rettype, op, "", "", PVOP_CALL_ARG1(arg1)) +#define _PVOP_VCALL1(sec, op, arg1) \ + __PVOP_VCALL(sec, op, "", "", PVOP_CALL_ARG1(arg1)) -#define PVOP_CALLEE1(rettype, op, arg1) \ - __PVOP_CALLEESAVE(rettype, op, "", "", PVOP_CALL_ARG1(arg1)) -#define PVOP_VCALLEE1(op, arg1) \ - __PVOP_VCALLEESAVE(op, "", "", PVOP_CALL_ARG1(arg1)) +#define _PVOP_CALLEE1(sec, rettype, op, arg1) \ + __PVOP_CALLEESAVE(sec, rettype, op, "", "", PVOP_CALL_ARG1(arg1)) +#define _PVOP_VCALLEE1(sec, op, arg1) \ + __PVOP_VCALLEESAVE(sec, op, "", "", PVOP_CALL_ARG1(arg1)) - -#define PVOP_CALL2(rettype, op, arg1, arg2) \ - __PVOP_CALL(rettype, op, "", "", PVOP_CALL_ARG1(arg1), \ +#define _PVOP_CALL2(sec, rettype, op, arg1, arg2) \ + __PVOP_CALL(sec, rettype, op, "", "", PVOP_CALL_ARG1(arg1), \ PVOP_CALL_ARG2(arg2)) -#define PVOP_VCALL2(op, arg1, arg2) \ - __PVOP_VCALL(op, "", "", PVOP_CALL_ARG1(arg1), \ +#define _PVOP_VCALL2(sec, op, arg1, arg2) \ + __PVOP_VCALL(sec, op, "", "", PVOP_CALL_ARG1(arg1), \ PVOP_CALL_ARG2(arg2)) -#define PVOP_CALLEE2(rettype, op, arg1, arg2) \ - __PVOP_CALLEESAVE(rettype, op, "", "", PVOP_CALL_ARG1(arg1), \ +#define _PVOP_CALLEE2(sec, rettype, op, arg1, arg2) \ + __PVOP_CALLEESAVE(sec, rettype, op, "", "", PVOP_CALL_ARG1(arg1), \ PVOP_CALL_ARG2(arg2)) -#define PVOP_VCALLEE2(op, arg1, arg2) \ - __PVOP_VCALLEESAVE(op, "", "", PVOP_CALL_ARG1(arg1), \ +#define _PVOP_VCALLEE2(sec, op, arg1, arg2) \ + __PVOP_VCALLEESAVE(sec, op, "", "", PVOP_CALL_ARG1(arg1), \ PVOP_CALL_ARG2(arg2)) -#define PVOP_CALL3(rettype, op, arg1, arg2, arg3) \ - __PVOP_CALL(rettype, op, "", "", PVOP_CALL_ARG1(arg1), \ +#define _PVOP_CALL3(sec, rettype, op, arg1, arg2, arg3) \ + __PVOP_CALL(sec, rettype, op, "", "", PVOP_CALL_ARG1(arg1), \ PVOP_CALL_ARG2(arg2), PVOP_CALL_ARG3(arg3)) -#define PVOP_VCALL3(op, arg1, arg2, arg3) \ - __PVOP_VCALL(op, "", "", PVOP_CALL_ARG1(arg1), \ +#define _PVOP_VCALL3(sec, op, arg1, arg2, arg3) \ + __PVOP_VCALL(sec, op, "", "", PVOP_CALL_ARG1(arg1), \ PVOP_CALL_ARG2(arg2), PVOP_CALL_ARG3(arg3)) /* This is the only difference in x86_64. We can make it much simpler */ #ifdef CONFIG_X86_32 -#define PVOP_CALL4(rettype, op, arg1, arg2, arg3, arg4) \ - __PVOP_CALL(rettype, op, \ +#define _PVOP_CALL4(sec, rettype, op, arg1, arg2, arg3, arg4) \ + __PVOP_CALL(sec, rettype, op, \ "push %[_arg4];", "lea 4(%%esp),%%esp;", \ PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2), \ PVOP_CALL_ARG3(arg3), [_arg4] "mr" ((u32)(arg4))) -#define PVOP_VCALL4(op, arg1, arg2, arg3, arg4) \ - __PVOP_VCALL(op, \ +#define _PVOP_VCALL4(sec, op, arg1, arg2, arg3, arg4) \ + __PVOP_VCALL(sec, op, \ "push %[_arg4];", "lea 4(%%esp),%%esp;", \ "0" ((u32)(arg1)), "1" ((u32)(arg2)), \ "2" ((u32)(arg3)), [_arg4] "mr" ((u32)(arg4))) #else -#define PVOP_CALL4(rettype, op, arg1, arg2, arg3, arg4) \ - __PVOP_CALL(rettype, op, "", "", \ +#define _PVOP_CALL4(sec, rettype, op, arg1, arg2, arg3, arg4) \ + __PVOP_CALL(sec, rettype, op, "", "", \ PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2), \ PVOP_CALL_ARG3(arg3), PVOP_CALL_ARG4(arg4)) -#define PVOP_VCALL4(op, arg1, arg2, arg3, arg4) \ - __PVOP_VCALL(op, "", "", \ +#define _PVOP_VCALL4(sec, op, arg1, arg2, arg3, arg4) \ + __PVOP_VCALL(sec, op, "", "", \ PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2), \ PVOP_CALL_ARG3(arg3), PVOP_CALL_ARG4(arg4)) #endif +/* + * PVOP macros for .parainstructions + */ +#define PVOP_CALL0(rettype, op) \ + _PVOP_CALL0(PV_SUFFIX, rettype, op) +#define PVOP_VCALL0(op) \ + _PVOP_VCALL0(PV_SUFFIX, op) + +#define PVOP_CALLEE0(rettype, op) \ + _PVOP_CALLEE0(PV_SUFFIX, rettype, op) +#define PVOP_VCALLEE0(op) \ + _PVOP_VCALLEE0(PV_SUFFIX, op) + +#define PVOP_CALL1(rettype, op, arg1) \ + _PVOP_CALL1(PV_SUFFIX, rettype, op, arg1) +#define PVOP_VCALL1(op, arg1) \ + _PVOP_VCALL1(PV_SUFFIX, op, arg1) + +#define PVOP_CALLEE1(rettype, op, arg1) \ + _PVOP_CALLEE1(PV_SUFFIX, rettype, op, arg1) +#define PVOP_VCALLEE1(op, arg1) \ + _PVOP_VCALLEE1(PV_SUFFIX, op, arg1) + +#define PVOP_CALL2(rettype, op, arg1, arg2) \ + _PVOP_CALL2(PV_SUFFIX, rettype, op, arg1, arg2) +#define PVOP_VCALL2(op, arg1, arg2) \ + _PVOP_VCALL2(PV_SUFFIX, op, arg1, arg2) + +#define PVOP_CALLEE2(rettype, op, arg1, arg2) \ + _PVOP_CALLEE2(PV_SUFFIX, rettype, op, arg1, arg2) +#define PVOP_VCALLEE2(op, arg1, arg2) \ + _PVOP_VCALLEE2(PV_SUFFIX, op, arg1, arg2) + +#define PVOP_CALL3(rettype, op, arg1, arg2, arg3) \ + _PVOP_CALL3(PV_SUFFIX, rettype, op, arg1, arg2, arg3) +#define PVOP_VCALL3(op, arg1, arg2, arg3) \ + _PVOP_VCALL3(PV_SUFFIX, op, arg1, arg2, arg3) + +#define PVOP_CALL4(rettype, op, arg1, arg2, arg3, arg4) \ + _PVOP_CALL4(PV_SUFFIX, rettype, op, arg1, arg2, arg3, arg4) +#define PVOP_VCALL4(op, arg1, arg2, arg3, arg4) \ + _PVOP_VCALL4(PV_SUFFIX, op, arg1, arg2, arg3, arg4) + /* Lazy mode for batching updates / context switch */ enum paravirt_lazy_mode { PARAVIRT_LAZY_NONE, @@ -667,7 +713,7 @@ u64 _paravirt_ident_64(u64); #define paravirt_nop ((void *)_paravirt_nop) -/* These all sit in the .parainstructions section to tell us what to patch. */ +/* These all sit in .parainstructions* sections to tell us what to patch. */ struct paravirt_patch_site { u8 *instr; /* original instructions */ u8 type; /* type of this instruction */ From patchwork Wed Apr 8 05:02:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479385 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DE225912 for ; Wed, 8 Apr 2020 05:07:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B2F6A2137B for ; Wed, 8 Apr 2020 05:07:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="JdkjanUu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727184AbgDHFHD (ORCPT ); Wed, 8 Apr 2020 01:07:03 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:38474 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726366AbgDHFF3 (ORCPT ); Wed, 8 Apr 2020 01:05:29 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853kJT191170; Wed, 8 Apr 2020 05:05:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=YqkpSF+xaj87ukpBfxxbSQruerieQm4kGvLikr6EByg=; b=JdkjanUuxwOILdkRa92kgljdqWjdgneiBkVZJxAeqNWM1rn046UVVjh8q+hLyr1jCaw5 efBQTN7ituKdbN9KZYhmSd6bCobGZy4Ai8YbCGV7w+NxE1HICzDkva4dKA2R+PUxPIzC shywYoahTgQ9oCJkap9TlaUAxYpHeIGVf+XMEFvzEXjWP7PfrP/slqcx10d7fz1KStg2 kcTlKOfFWU0p6o2YZbGTWkmxDBVRKFs1+6jB2CTRjJvQ3gD2DZgw1BPb+x1v8dy4x7qv J0Dv+ro6jQX3SXKG0NtTYkrxjdavZEFEdN+7kyhodZuYxmp5MbHHWuEIxUhLyQuwPcF6 vg== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 3091m0s0rs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:02 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03852XPH062239; Wed, 8 Apr 2020 05:05:02 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3020.oracle.com with ESMTP id 3091mh1k6d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:02 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03854w5i030423; Wed, 8 Apr 2020 05:04:59 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:04:57 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 02/26] x86/paravirt: Allow paravirt patching post-init Date: Tue, 7 Apr 2020 22:02:59 -0700 Message-Id: <20200408050323.4237-3-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 bulkscore=0 mlxlogscore=999 mlxscore=0 phishscore=0 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 phishscore=0 suspectscore=0 bulkscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 clxscore=1015 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Paravirt-ops are patched at init to convert indirect calls into direct calls and in some cases, to inline the target at the call-site. This is done by way of PVOP* macros which save the call-site information via compile time annotations. Pull this state out in .parainstructions.runtime for some pv-ops such that they can be used for runtime patching. Signed-off-by: Ankur Arora --- arch/x86/Kconfig | 12 ++++++++++++ arch/x86/include/asm/paravirt_types.h | 5 +++++ arch/x86/include/asm/text-patching.h | 5 +++++ arch/x86/kernel/alternative.c | 2 ++ arch/x86/kernel/module.c | 10 +++++++++- arch/x86/kernel/vmlinux.lds.S | 16 ++++++++++++++++ include/asm-generic/vmlinux.lds.h | 8 ++++++++ 7 files changed, 57 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1edf788d301c..605619938f08 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -764,6 +764,18 @@ config PARAVIRT over full virtualization. However, when run without a hypervisor the kernel is theoretically slower and slightly larger. +config PARAVIRT_RUNTIME + bool "Enable paravirtualized ops to be patched at runtime" + depends on PARAVIRT + help + Enable the paravirtualized guest kernel to switch pv-ops based on + changed host conditions, potentially improving performance + significantly. + + This would increase the memory footprint of the running kernel + slightly (depending mostly on whether lock and unlock are inlined + or not.) + config PARAVIRT_XXL bool diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 37e8f27a3b9d..00e4a062ca10 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -723,6 +723,11 @@ struct paravirt_patch_site { extern struct paravirt_patch_site __parainstructions[], __parainstructions_end[]; +#ifdef CONFIG_PARAVIRT_RUNTIME +extern struct paravirt_patch_site __parainstructions_runtime[], + __parainstructions_runtime_end[]; +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_PARAVIRT_TYPES_H */ diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index 67315fa3956a..e2ef241c261e 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -18,6 +18,11 @@ static inline void apply_paravirt(struct paravirt_patch_site *start, #define __parainstructions_end NULL #endif +#ifndef CONFIG_PARAVIRT_RUNTIME +#define __parainstructions_runtime NULL +#define __parainstructions_runtime_end NULL +#endif + /* * Currently, the max observed size in the kernel code is * JUMP_LABEL_NOP_SIZE/RELATIVEJUMP_SIZE, which are 5. diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 7867dfb3963e..fdfda1375f82 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -740,6 +740,8 @@ void __init alternative_instructions(void) #endif apply_paravirt(__parainstructions, __parainstructions_end); + apply_paravirt(__parainstructions_runtime, + __parainstructions_runtime_end); restart_nmi(); alternatives_patched = 1; diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index d5c72cb877b3..658ea60ce324 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -222,7 +222,7 @@ int module_finalize(const Elf_Ehdr *hdr, struct module *me) { const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL, - *para = NULL, *orc = NULL, *orc_ip = NULL; + *para = NULL, *para_run = NULL, *orc = NULL, *orc_ip = NULL; char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { @@ -234,6 +234,9 @@ int module_finalize(const Elf_Ehdr *hdr, locks = s; if (!strcmp(".parainstructions", secstrings + s->sh_name)) para = s; + if (!strcmp(".parainstructions.runtime", + secstrings + s->sh_name)) + para_run = s; if (!strcmp(".orc_unwind", secstrings + s->sh_name)) orc = s; if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name)) @@ -257,6 +260,11 @@ int module_finalize(const Elf_Ehdr *hdr, void *pseg = (void *)para->sh_addr; apply_paravirt(pseg, pseg + para->sh_size); } + if (para_run) { + void *pseg = (void *)para_run->sh_addr; + + apply_paravirt(pseg, pseg + para_run->sh_size); + } /* make jump label nops */ jump_label_apply_nops(me); diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 1bf7e312361f..7f5b8f6ab96e 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -269,6 +269,7 @@ SECTIONS .parainstructions : AT(ADDR(.parainstructions) - LOAD_OFFSET) { __parainstructions = .; *(.parainstructions) + PARAVIRT_DISCARD(.parainstructions.runtime) __parainstructions_end = .; } @@ -348,6 +349,21 @@ SECTIONS __smp_locks_end = .; } +#ifdef CONFIG_PARAVIRT_RUNTIME + /* + * .parainstructions.runtime sticks around in memory after + * init so it doesn't need to be page-aligned but everything + * around us is so we will be too. + */ + . = ALIGN(8); + .parainstructions.runtime : AT(ADDR(.parainstructions.runtime) - \ + LOAD_OFFSET) { + __parainstructions_runtime = .; + PARAVIRT_KEEP(.parainstructions.runtime) + __parainstructions_runtime_end = .; + } +#endif + #ifdef CONFIG_X86_64 .data_nosave : AT(ADDR(.data_nosave) - LOAD_OFFSET) { NOSAVE_DATA diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 71e387a5fe90..6b009d5ce51f 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -135,6 +135,14 @@ #define MEM_DISCARD(sec) *(.mem##sec) #endif +#if defined(CONFIG_PARAVIRT_RUNTIME) +#define PARAVIRT_KEEP(sec) *(sec) +#define PARAVIRT_DISCARD(sec) +#else +#define PARAVIRT_KEEP(sec) +#define PARAVIRT_DISCARD(sec) *(sec) +#endif + #ifdef CONFIG_FTRACE_MCOUNT_RECORD /* * The ftrace call sites are logged to a section whose name depends on the From patchwork Wed Apr 8 05:03:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479407 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6BE55912 for ; Wed, 8 Apr 2020 05:07:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4162C20771 for ; Wed, 8 Apr 2020 05:07:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="DF5tG1se" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727421AbgDHFHU (ORCPT ); Wed, 8 Apr 2020 01:07:20 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:39022 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726769AbgDHFHP (ORCPT ); Wed, 8 Apr 2020 01:07:15 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 038531q8179572; Wed, 8 Apr 2020 05:07:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=ssuzsbfGr+UP4f7vc9tDGOuDdiCw4ECans2QkELpN/o=; b=DF5tG1seAt9KkNyuEPs859A3/N7Z5Dar4IGAhqpC5aYgjyyzMmHQn2AgDX0IqU+sT+1N VPSFr3N9FFpywMgFiXOwNJjP4KRUz4E5dqFIQBBgcTeN1XDH/Qf2MIeCw3oHJPnZQR5g RbYyfAucvn8uNWXH1a4jczS6/aI/30fojyT6nK27iuR0uBc5uRV90g0xQnw4FSOy05rY bgc6dDRPE5jDAtPuXWv/w9+qGvQ4KpKgqOVtIJQygxNkOe5o/CLl7AlTk33O5iSA1q2l kDYYEav/L+kMFa1SgqKpRk657rjE6DP7bVbfSKLT00HjKegvGXYtTFrR4+7WFCtIugbS RA== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2120.oracle.com with ESMTP id 3091mnh19q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:07:02 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03852fSY148154; Wed, 8 Apr 2020 05:05:01 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3030.oracle.com with ESMTP id 3091kgj6st-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:01 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03854xi6007309; Wed, 8 Apr 2020 05:04:59 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:04:59 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 03/26] x86/paravirt: PVRTOP macros for PARAVIRT_RUNTIME Date: Tue, 7 Apr 2020 22:03:00 -0700 Message-Id: <20200408050323.4237-4-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 bulkscore=0 suspectscore=0 spamscore=0 malwarescore=0 adultscore=0 phishscore=0 mlxlogscore=752 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 mlxlogscore=813 mlxscore=0 priorityscore=1501 bulkscore=0 adultscore=0 impostorscore=0 phishscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Define PVRT* macros which can be used to put pv-ops in .parainstructions.runtime. Signed-off-by: Ankur Arora --- arch/x86/include/asm/paravirt_types.h | 49 +++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 00e4a062ca10..f1153f53c529 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -337,6 +337,12 @@ struct paravirt_patch_template { extern struct pv_info pv_info; extern struct paravirt_patch_template pv_ops; +#ifdef CONFIG_PARAVIRT_RUNTIME +#define PVRT_SUFFIX ".runtime" +#else +#define PVRT_SUFFIX "" +#endif + /* Sub-section for .parainstructions */ #define PV_SUFFIX "" @@ -693,6 +699,49 @@ int paravirt_disable_iospace(void); #define PVOP_VCALL4(op, arg1, arg2, arg3, arg4) \ _PVOP_VCALL4(PV_SUFFIX, op, arg1, arg2, arg3, arg4) +/* + * PVRTOP macros for .parainstructions.runtime + */ +#define PVRTOP_CALL0(rettype, op) \ + _PVOP_CALL0(PVRT_SUFFIX, rettype, op) +#define PVRTOP_VCALL0(op) \ + _PVOP_VCALL0(PVRT_SUFFIX, op) + +#define PVRTOP_CALLEE0(rettype, op) \ + _PVOP_CALLEE0(PVRT_SUFFIX, rettype, op) +#define PVRTOP_VCALLEE0(op) \ + _PVOP_VCALLEE0(PVRT_SUFFIX, op) + +#define PVRTOP_CALL1(rettype, op, arg1) \ + _PVOP_CALL1(PVRT_SUFFIX, rettype, op, arg1) +#define PVRTOP_VCALL1(op, arg1) \ + _PVOP_VCALL1(PVRT_SUFFIX, op, arg1) + +#define PVRTOP_CALLEE1(rettype, op, arg1) \ + _PVOP_CALLEE1(PVRT_SUFFIX, rettype, op, arg1) +#define PVRTOP_VCALLEE1(op, arg1) \ + _PVOP_VCALLEE1(PVRT_SUFFIX, op, arg1) + +#define PVRTOP_CALL2(rettype, op, arg1, arg2) \ + _PVOP_CALL2(PVRT_SUFFIX, rettype, op, arg1, arg2) +#define PVRTOP_VCALL2(op, arg1, arg2) \ + _PVOP_VCALL2(PVRT_SUFFIX, op, arg1, arg2) + +#define PVRTOP_CALLEE2(rettype, op, arg1, arg2) \ + _PVOP_CALLEE2(PVRT_SUFFIX, rettype, op, arg1, arg2) +#define PVRTOP_VCALLEE2(op, arg1, arg2) \ + _PVOP_VCALLEE2(PVRT_SUFFIX, op, arg1, arg2) + +#define PVRTOP_CALL3(rettype, op, arg1, arg2, arg3) \ + _PVOP_CALL3(PVRT_SUFFIX, rettype, op, arg1, arg2, arg3) +#define PVRTOP_VCALL3(op, arg1, arg2, arg3) \ + _PVOP_VCALL3(PVRT_SUFFIX, op, arg1, arg2, arg3) + +#define PVRTOP_CALL4(rettype, op, arg1, arg2, arg3, arg4) \ + _PVOP_CALL4(PVRT_SUFFIX, rettype, op, arg1, arg2, arg3, arg4) +#define PVRTOP_VCALL4(op, arg1, arg2, arg3, arg4) \ + _PVOP_VCALL4(PVRT_SUFFIX, op, arg1, arg2, arg3, arg4) + /* Lazy mode for batching updates / context switch */ enum paravirt_lazy_mode { PARAVIRT_LAZY_NONE, From patchwork Wed Apr 8 05:03:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479411 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 398FC112C for ; Wed, 8 Apr 2020 05:07:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0CAE32078E for ; Wed, 8 Apr 2020 05:07:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="yvqh2h8f" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727307AbgDHFHT (ORCPT ); Wed, 8 Apr 2020 01:07:19 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:53914 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727329AbgDHFHQ (ORCPT ); Wed, 8 Apr 2020 01:07:16 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03854F75012937; Wed, 8 Apr 2020 05:07:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=b9ZDBcnjVqZYjAykwqlmE2xiwEK9dL8itGzqN60MSKM=; b=yvqh2h8fRSDw2IhBIhVmT8Jhg9RgCzoi3zM12GriJP7fRL7dt2B//8NqeTUyB00lh8X8 IGGhpQ/2Bon2IuLfRqt9DwjBZXJ7PQf6SkIEadHvUmDYV4eToQQetFRQVdux8EYCGWin l4bX6zTIsPz9IeO2nXSUunQMy7sO4/t/ONz7K9g+FfZP5LYPCJQMbi+k5CH5htsMNWBq nM/zMdP6NNwLvJLu83EW/rx0DSl3EEyxneBAuHj+sm1CQ3GM4d717qkNjFOP/vZu6+yN Zy9Y59GZbOzSafi2urXoMJmSD4oLAPKhMSTKhg0MLR6pb+ZS74PmsNVFAya2wIY3SGoA qQ== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 3091m3914c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:07:03 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853Ksr158690; Wed, 8 Apr 2020 05:05:03 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 3091m01fjb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:03 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 038551MG007321; Wed, 8 Apr 2020 05:05:01 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:01 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 04/26] x86/alternatives: Refactor alternatives_smp_module* Date: Tue, 7 Apr 2020 22:03:01 -0700 Message-Id: <20200408050323.4237-5-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 malwarescore=0 mlxlogscore=999 phishscore=0 spamscore=0 adultscore=0 suspectscore=2 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 adultscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 priorityscore=1501 clxscore=1015 bulkscore=0 phishscore=0 mlxscore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Refactor alternatives_smp_module* logic to make it available for holding generic late patching state. Most of the changes are to pull the module handling logic out from CONFIG_SMP. In addition now we unconditionally call alternatives_smp_module_add() and make the decision on patching for UP or not there. Signed-off-by: Ankur Arora --- arch/x86/include/asm/alternative.h | 13 ++----- arch/x86/kernel/alternative.c | 55 ++++++++++++++++-------------- 2 files changed, 32 insertions(+), 36 deletions(-) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index 13adca37c99a..8235bbb746d9 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -75,24 +75,15 @@ extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end); struct module; -#ifdef CONFIG_SMP extern void alternatives_smp_module_add(struct module *mod, char *name, void *locks, void *locks_end, void *text, void *text_end); extern void alternatives_smp_module_del(struct module *mod); -extern void alternatives_enable_smp(void); extern int alternatives_text_reserved(void *start, void *end); -extern bool skip_smp_alternatives; +#ifdef CONFIG_SMP +extern void alternatives_enable_smp(void); #else -static inline void alternatives_smp_module_add(struct module *mod, char *name, - void *locks, void *locks_end, - void *text, void *text_end) {} -static inline void alternatives_smp_module_del(struct module *mod) {} static inline void alternatives_enable_smp(void) {} -static inline int alternatives_text_reserved(void *start, void *end) -{ - return 0; -} #endif /* CONFIG_SMP */ #define b_replacement(num) "664"#num diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index fdfda1375f82..32aa1ddf441d 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -470,6 +470,13 @@ static void alternatives_smp_unlock(const s32 *start, const s32 *end, } } +static bool uniproc_patched; /* protected by text_mutex */ +#else /* !CONFIG_SMP */ +#define uniproc_patched false +static inline void alternatives_smp_unlock(const s32 *start, const s32 *end, + u8 *text, u8 *text_end) { } +#endif /* CONFIG_SMP */ + struct smp_alt_module { /* what is this ??? */ struct module *mod; @@ -486,7 +493,6 @@ struct smp_alt_module { struct list_head next; }; static LIST_HEAD(smp_alt_modules); -static bool uniproc_patched = false; /* protected by text_mutex */ void __init_or_module alternatives_smp_module_add(struct module *mod, char *name, @@ -495,23 +501,27 @@ void __init_or_module alternatives_smp_module_add(struct module *mod, { struct smp_alt_module *smp; - mutex_lock(&text_mutex); +#ifdef CONFIG_SMP + /* Patch to UP if other cpus not imminent. */ + if (!noreplace_smp && (num_present_cpus() == 1 || setup_max_cpus <= 1)) + uniproc_patched = true; +#endif if (!uniproc_patched) - goto unlock; + return; - if (num_possible_cpus() == 1) - /* Don't bother remembering, we'll never have to undo it. */ - goto smp_unlock; + mutex_lock(&text_mutex); - smp = kzalloc(sizeof(*smp), GFP_KERNEL); - if (NULL == smp) - /* we'll run the (safe but slow) SMP code then ... */ - goto unlock; + smp = kzalloc(sizeof(*smp), GFP_KERNEL | __GFP_NOFAIL); smp->mod = mod; smp->name = name; - smp->locks = locks; - smp->locks_end = locks_end; + + if (num_possible_cpus() != 1 || uniproc_patched) { + /* Remember only if we'll need to undo it. */ + smp->locks = locks; + smp->locks_end = locks_end; + } + smp->text = text; smp->text_end = text_end; DPRINTK("locks %p -> %p, text %p -> %p, name %s\n", @@ -519,9 +529,9 @@ void __init_or_module alternatives_smp_module_add(struct module *mod, smp->text, smp->text_end, smp->name); list_add_tail(&smp->next, &smp_alt_modules); -smp_unlock: - alternatives_smp_unlock(locks, locks_end, text, text_end); -unlock: + + if (uniproc_patched) + alternatives_smp_unlock(locks, locks_end, text, text_end); mutex_unlock(&text_mutex); } @@ -540,6 +550,7 @@ void __init_or_module alternatives_smp_module_del(struct module *mod) mutex_unlock(&text_mutex); } +#ifdef CONFIG_SMP void alternatives_enable_smp(void) { struct smp_alt_module *mod; @@ -561,6 +572,7 @@ void alternatives_enable_smp(void) } mutex_unlock(&text_mutex); } +#endif /* CONFIG_SMP */ /* * Return 1 if the address range is reserved for SMP-alternatives. @@ -588,7 +600,6 @@ int alternatives_text_reserved(void *start, void *end) return 0; } -#endif /* CONFIG_SMP */ #ifdef CONFIG_PARAVIRT void __init_or_module apply_paravirt(struct paravirt_patch_site *start, @@ -723,21 +734,15 @@ void __init alternative_instructions(void) apply_alternatives(__alt_instructions, __alt_instructions_end); -#ifdef CONFIG_SMP - /* Patch to UP if other cpus not imminent. */ - if (!noreplace_smp && (num_present_cpus() == 1 || setup_max_cpus <= 1)) { - uniproc_patched = true; - alternatives_smp_module_add(NULL, "core kernel", - __smp_locks, __smp_locks_end, - _text, _etext); - } + alternatives_smp_module_add(NULL, "core kernel", + __smp_locks, __smp_locks_end, + _text, _etext); if (!uniproc_patched || num_possible_cpus() == 1) { free_init_pages("SMP alternatives", (unsigned long)__smp_locks, (unsigned long)__smp_locks_end); } -#endif apply_paravirt(__parainstructions, __parainstructions_end); apply_paravirt(__parainstructions_runtime, From patchwork Wed Apr 8 05:03:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479333 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2DC22912 for ; Wed, 8 Apr 2020 05:05:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 00BAC2083E for ; Wed, 8 Apr 2020 05:05:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="cymcX7cs" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726889AbgDHFF6 (ORCPT ); Wed, 8 Apr 2020 01:05:58 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:38728 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726801AbgDHFFt (ORCPT ); Wed, 8 Apr 2020 01:05:49 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853ldR191183; Wed, 8 Apr 2020 05:05:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=9tI0JiEef8to344a69A5BaCsrTLQkt/TJVKmTnFs63o=; b=cymcX7cssNqc00recnMFSS2WRHjmFb+WkYX/rdlb2DTEEo/1m5i0n7hDKVgpoqne9dgn qV1fq5CSllni2nW8Do8aEIZhVjZ8agzgViY1rUV1JF5H+iB6BJ3DhfFb/tj85Neao/JM UbaVGFXzb3MRcA0skHvewC8LLUGflolD4ki3Io32ZblIdAcM/dOjL0uK+npq1AsJQpiv PSoqZvsLMGnuFEHxqgAokUGAbjWncy15IVg27Xwes+Gn1YoKQe/F9sRwPfea2sYw8xQa 0v3LPO37oQizGXncjiBA6XzhmWHHck7+6W6fBNpbaWygDfMk5OaOnmcap0yxwdwkaxkI RA== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 3091m0s0ru-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:06 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03852XvD062260; Wed, 8 Apr 2020 05:05:06 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3020.oracle.com with ESMTP id 3091mh1kc5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:05 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 038554hD007324; Wed, 8 Apr 2020 05:05:04 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:04 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 05/26] x86/alternatives: Rename alternatives_smp*, smp_alt_module Date: Tue, 7 Apr 2020 22:03:02 -0700 Message-Id: <20200408050323.4237-6-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=2 bulkscore=0 mlxlogscore=999 mlxscore=0 phishscore=0 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 phishscore=0 suspectscore=2 bulkscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 clxscore=1015 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Rename alternatives_smp_module_*(), smp_alt_module to reflect their new purpose. Signed-off-by: Ankur Arora --- arch/x86/include/asm/alternative.h | 10 +++--- arch/x86/kernel/alternative.c | 54 +++++++++++++++--------------- arch/x86/kernel/module.c | 8 ++--- 3 files changed, 36 insertions(+), 36 deletions(-) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index 8235bbb746d9..db91a7731d87 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -75,11 +75,11 @@ extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end); struct module; -extern void alternatives_smp_module_add(struct module *mod, char *name, - void *locks, void *locks_end, - void *text, void *text_end); -extern void alternatives_smp_module_del(struct module *mod); -extern int alternatives_text_reserved(void *start, void *end); +void alternatives_module_add(struct module *mod, char *name, + void *locks, void *locks_end, + void *text, void *text_end); +void alternatives_module_del(struct module *mod); +int alternatives_text_reserved(void *start, void *end); #ifdef CONFIG_SMP extern void alternatives_enable_smp(void); #else diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 32aa1ddf441d..4157f848b537 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -477,7 +477,7 @@ static inline void alternatives_smp_unlock(const s32 *start, const s32 *end, u8 *text, u8 *text_end) { } #endif /* CONFIG_SMP */ -struct smp_alt_module { +struct alt_module { /* what is this ??? */ struct module *mod; char *name; @@ -492,14 +492,14 @@ struct smp_alt_module { struct list_head next; }; -static LIST_HEAD(smp_alt_modules); -void __init_or_module alternatives_smp_module_add(struct module *mod, - char *name, - void *locks, void *locks_end, - void *text, void *text_end) +static LIST_HEAD(alt_modules); + +void __init_or_module alternatives_module_add(struct module *mod, char *name, + void *locks, void *locks_end, + void *text, void *text_end) { - struct smp_alt_module *smp; + struct alt_module *alt; #ifdef CONFIG_SMP /* Patch to UP if other cpus not imminent. */ @@ -511,36 +511,36 @@ void __init_or_module alternatives_smp_module_add(struct module *mod, mutex_lock(&text_mutex); - smp = kzalloc(sizeof(*smp), GFP_KERNEL | __GFP_NOFAIL); + alt = kzalloc(sizeof(*alt), GFP_KERNEL | __GFP_NOFAIL); - smp->mod = mod; - smp->name = name; + alt->mod = mod; + alt->name = name; if (num_possible_cpus() != 1 || uniproc_patched) { /* Remember only if we'll need to undo it. */ - smp->locks = locks; - smp->locks_end = locks_end; + alt->locks = locks; + alt->locks_end = locks_end; } - smp->text = text; - smp->text_end = text_end; + alt->text = text; + alt->text_end = text_end; DPRINTK("locks %p -> %p, text %p -> %p, name %s\n", - smp->locks, smp->locks_end, - smp->text, smp->text_end, smp->name); + alt->locks, alt->locks_end, + alt->text, alt->text_end, alt->name); - list_add_tail(&smp->next, &smp_alt_modules); + list_add_tail(&alt->next, &alt_modules); if (uniproc_patched) alternatives_smp_unlock(locks, locks_end, text, text_end); mutex_unlock(&text_mutex); } -void __init_or_module alternatives_smp_module_del(struct module *mod) +void __init_or_module alternatives_module_del(struct module *mod) { - struct smp_alt_module *item; + struct alt_module *item; mutex_lock(&text_mutex); - list_for_each_entry(item, &smp_alt_modules, next) { + list_for_each_entry(item, &alt_modules, next) { if (mod != item->mod) continue; list_del(&item->next); @@ -553,7 +553,7 @@ void __init_or_module alternatives_smp_module_del(struct module *mod) #ifdef CONFIG_SMP void alternatives_enable_smp(void) { - struct smp_alt_module *mod; + struct alt_module *mod; /* Why bother if there are no other CPUs? */ BUG_ON(num_possible_cpus() == 1); @@ -565,7 +565,7 @@ void alternatives_enable_smp(void) BUG_ON(num_online_cpus() != 1); clear_cpu_cap(&boot_cpu_data, X86_FEATURE_UP); clear_cpu_cap(&cpu_data(0), X86_FEATURE_UP); - list_for_each_entry(mod, &smp_alt_modules, next) + list_for_each_entry(mod, &alt_modules, next) alternatives_smp_lock(mod->locks, mod->locks_end, mod->text, mod->text_end); uniproc_patched = false; @@ -580,14 +580,14 @@ void alternatives_enable_smp(void) */ int alternatives_text_reserved(void *start, void *end) { - struct smp_alt_module *mod; + struct alt_module *mod; const s32 *poff; u8 *text_start = start; u8 *text_end = end; lockdep_assert_held(&text_mutex); - list_for_each_entry(mod, &smp_alt_modules, next) { + list_for_each_entry(mod, &alt_modules, next) { if (mod->text > text_end || mod->text_end < text_start) continue; for (poff = mod->locks; poff < mod->locks_end; poff++) { @@ -734,9 +734,9 @@ void __init alternative_instructions(void) apply_alternatives(__alt_instructions, __alt_instructions_end); - alternatives_smp_module_add(NULL, "core kernel", - __smp_locks, __smp_locks_end, - _text, _etext); + alternatives_module_add(NULL, "core kernel", + __smp_locks, __smp_locks_end, + _text, _etext); if (!uniproc_patched || num_possible_cpus() == 1) { free_init_pages("SMP alternatives", diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index 658ea60ce324..fc3d35198b09 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -251,9 +251,9 @@ int module_finalize(const Elf_Ehdr *hdr, if (locks && text) { void *lseg = (void *)locks->sh_addr; void *tseg = (void *)text->sh_addr; - alternatives_smp_module_add(me, me->name, - lseg, lseg + locks->sh_size, - tseg, tseg + text->sh_size); + alternatives_module_add(me, me->name, + lseg, lseg + locks->sh_size, + tseg, tseg + text->sh_size); } if (para) { @@ -278,5 +278,5 @@ int module_finalize(const Elf_Ehdr *hdr, void module_arch_cleanup(struct module *mod) { - alternatives_smp_module_del(mod); + alternatives_module_del(mod); } From patchwork Wed Apr 8 05:03:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479369 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB4FD92C for ; Wed, 8 Apr 2020 05:06:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8A0812083E for ; Wed, 8 Apr 2020 05:06:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="alK3sFDt" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726475AbgDHFFd (ORCPT ); Wed, 8 Apr 2020 01:05:33 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:38022 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726345AbgDHFFa (ORCPT ); Wed, 8 Apr 2020 01:05:30 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853MYK179629; Wed, 8 Apr 2020 05:05:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=wXWzqKpXP4X1QYluyMKcRS0Im/RDUvqYRVQBv1BS7nY=; b=alK3sFDt1Pwv7BYgZTwY8xnWBnHMLyJyt9SlwdNVAK2rna7XJu9iH2Y4DM6F0e9VfnC3 GBqgu4QeDotoIm1NugkOTmgOqQb8NWWI3q5AvR/VAr/yW4T5A9roFCPiUgvbdkJf2VYr F0uBX8fTAVhVvjGNnf90CpBJP5UeILeKgH/DFfNpGbja2gaUn/33CY4yQCYCUjHrZ1x+ 9+GQAaIiiFoEj2zgpzDOmZOCBpdD17bAzpbXUZu+hER4VtI+Ue9QyHVrw3rsHvCXl3RO mhfNnyC1wWZofzLK34HV/83fsKsCqIOOP8jU05D0+FNlDl7JqAlacAHsF/SSoud5ZyCd GQ== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2120.oracle.com with ESMTP id 3091mnh140-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:07 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03851X9K100807; Wed, 8 Apr 2020 05:05:06 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 3091m2huh7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:06 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 038556XE030455; Wed, 8 Apr 2020 05:05:06 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:05 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 06/26] x86/alternatives: Remove stale symbols Date: Tue, 7 Apr 2020 22:03:03 -0700 Message-Id: <20200408050323.4237-7-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 mlxscore=0 malwarescore=0 spamscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 bulkscore=0 adultscore=0 impostorscore=0 phishscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org __start_parainstructions and __stop_parainstructions aren't defined, remove them. Signed-off-by: Ankur Arora --- arch/x86/kernel/alternative.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 4157f848b537..09e4ee0e09a2 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -623,8 +623,6 @@ void __init_or_module apply_paravirt(struct paravirt_patch_site *start, text_poke_early(p->instr, insn_buff, p->len); } } -extern struct paravirt_patch_site __start_parainstructions[], - __stop_parainstructions[]; #endif /* CONFIG_PARAVIRT */ /* From patchwork Wed Apr 8 05:03:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479403 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B1F892C for ; Wed, 8 Apr 2020 05:07:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7059520CC7 for ; Wed, 8 Apr 2020 05:07:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="ECcLHHKM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727221AbgDHFHL (ORCPT ); Wed, 8 Apr 2020 01:07:11 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:38460 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726192AbgDHFF3 (ORCPT ); Wed, 8 Apr 2020 01:05:29 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 038544lW191251; Wed, 8 Apr 2020 05:05:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=Q/x63j7IF9jNQkX6xMMSEc7WXp2MrGiG0gMNWP0OfWM=; b=ECcLHHKM1Y3f/mNjqQHcNekCfZwy7o5zdSSQAxLfqjPZMM2CwYlwO9ziEEQME73MsHEU L++tyOqLrBEz+TPthoQ2QMk9tFdULvWDh4X7mSjd1qCUoYVZk28ix7ZtaRdJ1F2vD2PG eMan5188ae3P/oVnDlkKI+g407m/RDHzMPH9B1nDUdn9BJAV5875XtLNojw57f/axCcB AQtKmdYasSQzu01XH4kHhT8ck/rfDnFf6wPrMCXiNGxOZ70jmj13uXWotmQoRbM+QzS6 IgK8/idFBsVRyi7CG4lOSpSf8+0yp0jR+s01UpXhsN41QuU9tT4s+LayyeXT4pPfCgK2 7A== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 3091m0s0s4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:09 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853Kcm158669; Wed, 8 Apr 2020 05:05:08 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 3091m01frk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:08 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 038557fF007332; Wed, 8 Apr 2020 05:05:07 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:07 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 07/26] x86/paravirt: Persist .parainstructions.runtime Date: Tue, 7 Apr 2020 22:03:04 -0700 Message-Id: <20200408050323.4237-8-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 malwarescore=0 mlxlogscore=999 phishscore=0 spamscore=0 adultscore=0 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 phishscore=0 suspectscore=0 bulkscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 clxscore=1015 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Persist .parainstructions.runtime in memory. We will use it to patch paravirt-ops at runtime. The extra memory footprint depends on chosen config options but the inlined queued_spin_unlock() presents an edge case: $ objdump -h vmlinux|grep .parainstructions Idx Name Size VMA LMA File-off Algn 27 .parainstructions 0001013c ffffffff82895000 0000000002895000 01c95000 2**3 28 .parainstructions.runtime 0000cd2c ffffffff828a5140 00000000028a5140 01ca5140 2**3 (The added footprint is the size of the .parainstructions.runtime section.) $ size vmlinux text data bss dec hex filename 13726196 12302814 14094336 40123346 2643bd2 vmlinux Signed-off-by: Ankur Arora --- arch/x86/include/asm/alternative.h | 1 + arch/x86/kernel/alternative.c | 16 +++++++++++++++- arch/x86/kernel/module.c | 28 +++++++++++++++++++++++----- 3 files changed, 39 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index db91a7731d87..d19546c14ff6 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -76,6 +76,7 @@ extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end); struct module; void alternatives_module_add(struct module *mod, char *name, + void *para, void *para_end, void *locks, void *locks_end, void *text, void *text_end); void alternatives_module_del(struct module *mod); diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 09e4ee0e09a2..8189ac21624c 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -482,6 +482,12 @@ struct alt_module { struct module *mod; char *name; +#ifdef CONFIG_PARAVIRT_RUNTIME + /* ptrs to paravirt sites */ + struct paravirt_patch_site *para; + struct paravirt_patch_site *para_end; +#endif + /* ptrs to lock prefixes */ const s32 *locks; const s32 *locks_end; @@ -496,6 +502,7 @@ struct alt_module { static LIST_HEAD(alt_modules); void __init_or_module alternatives_module_add(struct module *mod, char *name, + void *para, void *para_end, void *locks, void *locks_end, void *text, void *text_end) { @@ -506,7 +513,7 @@ void __init_or_module alternatives_module_add(struct module *mod, char *name, if (!noreplace_smp && (num_present_cpus() == 1 || setup_max_cpus <= 1)) uniproc_patched = true; #endif - if (!uniproc_patched) + if (!IS_ENABLED(CONFIG_PARAVIRT_RUNTIME) && !uniproc_patched) return; mutex_lock(&text_mutex); @@ -516,6 +523,11 @@ void __init_or_module alternatives_module_add(struct module *mod, char *name, alt->mod = mod; alt->name = name; +#ifdef CONFIG_PARAVIRT_RUNTIME + alt->para = para; + alt->para_end = para_end; +#endif + if (num_possible_cpus() != 1 || uniproc_patched) { /* Remember only if we'll need to undo it. */ alt->locks = locks; @@ -733,6 +745,8 @@ void __init alternative_instructions(void) apply_alternatives(__alt_instructions, __alt_instructions_end); alternatives_module_add(NULL, "core kernel", + __parainstructions_runtime, + __parainstructions_runtime_end, __smp_locks, __smp_locks_end, _text, _etext); diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index fc3d35198b09..7b2632184c11 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -248,12 +248,30 @@ int module_finalize(const Elf_Ehdr *hdr, void *aseg = (void *)alt->sh_addr; apply_alternatives(aseg, aseg + alt->sh_size); } - if (locks && text) { - void *lseg = (void *)locks->sh_addr; - void *tseg = (void *)text->sh_addr; + if (para_run || (locks && text)) { + void *pseg, *pseg_end; + void *lseg, *lseg_end; + void *tseg, *tseg_end; + + pseg = pseg_end = NULL; + lseg = lseg_end = NULL; + tseg = tseg_end = NULL; + if (para_run) { + pseg = (void *)para_run->sh_addr; + pseg_end = pseg + para_run->sh_size; + } + + if (locks && text) { + tseg = (void *)text->sh_addr; + tseg_end = tseg + text->sh_size; + + lseg = (void *)locks->sh_addr; + lseg_end = lseg + locks->sh_size; + } alternatives_module_add(me, me->name, - lseg, lseg + locks->sh_size, - tseg, tseg + text->sh_size); + pseg, pseg_end, + lseg, lseg_end, + tseg, tseg_end); } if (para) { From patchwork Wed Apr 8 05:03:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479415 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 27F1A92C for ; Wed, 8 Apr 2020 05:07:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 064AA2078E for ; Wed, 8 Apr 2020 05:07:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="qr0UjqR9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726477AbgDHFF2 (ORCPT ); Wed, 8 Apr 2020 01:05:28 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:38454 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726089AbgDHFF2 (ORCPT ); Wed, 8 Apr 2020 01:05:28 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 038548DP191274; Wed, 8 Apr 2020 05:05:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=vzLY3W6pyhiry0YbTLA6k/8Hm1Z7yOv34dOpb5FTaTE=; b=qr0UjqR9noiEapG/2uGPeaa+qlAvFcUPOv8/v5srmCWNf6Yyqwn4Y3EB8vYUlynBtqLi PnTYirPsZIADmW/KoizPSQxBfFMySPcOBhJ1chkvIMKvwRawLUlqwDxtB+tTKShD0qN7 3+Hw2lIB85sESp0QzrA82nr8Wvaf6+wQbp9gvzSxL6Qf/bdDYRNNkgZvjUVrzkq4EJig w2JpoTyPnomhqeeIc5Gz5M0R6v5ChJwHobiL/5/HGKRS2d8je9idVLQY8gye0GyaFN87 njqBnrlrHSu8VZsQUMb/oY1mmXNw96ID+tuzQ0VkP+zaev/YJKYafcRfJBuiiP/GxNOR /Q== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 3091m0s0s6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:10 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03852Yjo062439; Wed, 8 Apr 2020 05:05:10 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 3091mh1kg9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:10 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0385589q022152; Wed, 8 Apr 2020 05:05:08 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:08 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 08/26] x86/paravirt: Stash native pv-ops Date: Tue, 7 Apr 2020 22:03:05 -0700 Message-Id: <20200408050323.4237-9-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 bulkscore=0 mlxlogscore=986 mlxscore=0 phishscore=0 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 phishscore=0 suspectscore=0 bulkscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 clxscore=1015 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Introduce native_pv_ops where we stash the pv_ops array before hypervisor specific hooks have modified it. native_pv_ops get used when switching between paravirt and native pv-ops at runtime. Signed-off-by: Ankur Arora --- arch/x86/include/asm/paravirt_types.h | 4 ++++ arch/x86/kernel/paravirt.c | 10 ++++++++++ arch/x86/kernel/setup.c | 2 ++ 3 files changed, 16 insertions(+) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index f1153f53c529..bc935eec7ec6 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -339,6 +339,7 @@ extern struct paravirt_patch_template pv_ops; #ifdef CONFIG_PARAVIRT_RUNTIME #define PVRT_SUFFIX ".runtime" +extern struct paravirt_patch_template native_pv_ops; #else #define PVRT_SUFFIX "" #endif @@ -775,6 +776,9 @@ extern struct paravirt_patch_site __parainstructions[], #ifdef CONFIG_PARAVIRT_RUNTIME extern struct paravirt_patch_site __parainstructions_runtime[], __parainstructions_runtime_end[]; +void paravirt_ops_init(void); +#else +static inline void paravirt_ops_init(void) { } #endif #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index c131ba4e70ef..8c511cc4d4f4 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -458,5 +458,15 @@ NOKPROBE_SYMBOL(native_set_debugreg); NOKPROBE_SYMBOL(native_load_idt); #endif +#ifdef CONFIG_PARAVIRT_RUNTIME +__ro_after_init struct paravirt_patch_template native_pv_ops; + +void __init paravirt_ops_init(void) +{ + native_pv_ops = pv_ops; +} +EXPORT_SYMBOL(native_pv_ops); +#endif + EXPORT_SYMBOL(pv_ops); EXPORT_SYMBOL_GPL(pv_info); diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index e6b545047f38..2746a6a78fe7 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -43,6 +43,7 @@ #include #include #include +#include /* * max_low_pfn_mapped: highest directly mapped pfn < 4 GB @@ -831,6 +832,7 @@ void __init setup_arch(char **cmdline_p) boot_cpu_data.x86_phys_bits = MAX_PHYSMEM_BITS; #endif + paravirt_ops_init(); /* * If we have OLPC OFW, we might end up relocating the fixmap due to * reserve_top(), so do this before touching the ioremap area. From patchwork Wed Apr 8 05:03:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479375 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E28C92C for ; Wed, 8 Apr 2020 05:06:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4948920769 for ; Wed, 8 Apr 2020 05:06:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="AlnOo69N" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726610AbgDHFFc (ORCPT ); Wed, 8 Apr 2020 01:05:32 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:38012 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726475AbgDHFFa (ORCPT ); Wed, 8 Apr 2020 01:05:30 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853MYL179629; Wed, 8 Apr 2020 05:05:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=ZGw+hgAAF68C57Q7ZOripu6KHJbA5oEpyXh43CXBxZY=; b=AlnOo69NHgytLqrqPenbwiO51OpjhQm6qCRzV044f8RNfXL639+j9+y4lJ1hDyZW/UmJ hLXcRRBSui97Qw4QuVYDUMVEjBk9jFHguLdJw8P21Nln8iACE1ScZwMoS6ya9yIShqab vsVwbFuUeu9oodd3WT/WD4bTepdiwomDvs9WcyTN3mi6JVaGJz3Tc967HJV+Uo1ZJhxJ 2TyMGCpZCeQJR4Uix5ZSjbPjc1n0ucB2dUw3TQm3oHWvqv9lxrSy8o365UX+m4GJUcK6 c6qFYtvnLzekrWD7m5COhXYsddWMeclqCbgOKYV4cyGs+WZI9yDH0o1gNJSPg839s7p9 pA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 3091mnh148-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:13 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853KZ9158671; Wed, 8 Apr 2020 05:05:13 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3030.oracle.com with ESMTP id 3091m01fuy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:13 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03855AVI015225; Wed, 8 Apr 2020 05:05:10 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:09 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 09/26] x86/paravirt: Add runtime_patch() Date: Tue, 7 Apr 2020 22:03:06 -0700 Message-Id: <20200408050323.4237-10-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 malwarescore=0 mlxlogscore=999 phishscore=0 spamscore=0 adultscore=0 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 bulkscore=0 adultscore=0 impostorscore=0 phishscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org runtime_patch() generates insn sequences for patching supported pv_ops. It does this by calling paravirt_patch_default() or native_patch() dpending on if the target is a paravirt or native pv-op. In addition, runtime_patch() also whitelists pv-ops that are safe to patch at runtime. The static conditions that need to be satisfied to patch safely: - Insn sequences under replacement need to execute without preemption. This is meant to avoid scenarios where a call-site (ex. lock.vcpu_is_preempted) switches between the following sequences: lock.vcpu_is_preempted = __raw_callee_save___kvm_vcpu_is_preempted 0: e8 31 e6 ff ff callq 0xffffffffffffe636 4: 66 90 xchg %ax,%ax # NOP2 lock.vcpu_is_preempted = __raw_callee_save___native_vcpu_is_preempted 0: 31 c0 xor %rax, %rax 2: 0f 1f 44 00 00 nopl 0x0(%rax) # NOP5 If kvm_vcpu_is_preempted() were preemptible, then, post patching we would return to address 4 above, which is in the middle of an instruction for native_vcpu_is_preempted(). Even if this were to be made safe (ex. by changing the NOP2 to be a prefix instead of a suffix), it would still not be enough -- since we do not want any code from the switched out pv-op to be executing after the pv-op has been switched out. - Entered only at the beginning: this allows us to use text_poke() which uses INT3 as a barrier. We don't store the address inside any call-sites so the second can be assumed. Guaranteeing the first condition boils down to stating that any pv-op being patched cannot be present/referenced from any call-stack in the system. pv-ops that are not obviously non-preemptible need to be enclosed in preempt_disable_runtime_patch()/preempt_enable_runtime_patch(). This should be sufficient because runtime_patch() itself is called from a stop_machine() context which would would be enough to flush out any non-preemptible sequences. Note that preemption in the host is okay: stop_machine() would unwind any pv-ops sleeping in the host. Signed-off-by: Ankur Arora --- arch/x86/include/asm/paravirt_types.h | 8 +++++ arch/x86/kernel/paravirt.c | 6 +--- arch/x86/kernel/paravirt_patch.c | 49 +++++++++++++++++++++++++++ include/linux/preempt.h | 17 ++++++++++ 4 files changed, 75 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index bc935eec7ec6..3b9f6c105397 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -350,6 +350,12 @@ extern struct paravirt_patch_template native_pv_ops; #define PARAVIRT_PATCH(x) \ (offsetof(struct paravirt_patch_template, x) / sizeof(void *)) +/* + * Neat trick to map patch type back to the call within the + * corresponding structure. + */ +#define PARAVIRT_PATCH_OP(ops, type) (*(long *)(&((long **)&(ops))[type])) + #define paravirt_type(op) \ [paravirt_typenum] "i" (PARAVIRT_PATCH(op)), \ [paravirt_opptr] "i" (&(pv_ops.op)) @@ -383,6 +389,8 @@ unsigned paravirt_patch_default(u8 type, void *insn_buff, unsigned long addr, un unsigned paravirt_patch_insns(void *insn_buff, unsigned len, const char *start, const char *end); unsigned native_patch(u8 type, void *insn_buff, unsigned long addr, unsigned len); +int runtime_patch(u8 type, void *insn_buff, void *op, unsigned long addr, + unsigned int len); int paravirt_disable_iospace(void); diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 8c511cc4d4f4..c4128436b05a 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -117,11 +117,7 @@ void __init native_pv_lock_init(void) unsigned paravirt_patch_default(u8 type, void *insn_buff, unsigned long addr, unsigned len) { - /* - * Neat trick to map patch type back to the call within the - * corresponding structure. - */ - void *opfunc = *((void **)&pv_ops + type); + void *opfunc = (void *)PARAVIRT_PATCH_OP(pv_ops, type); unsigned ret; if (opfunc == NULL) diff --git a/arch/x86/kernel/paravirt_patch.c b/arch/x86/kernel/paravirt_patch.c index 3eff63c090d2..3eb8c0e720b4 100644 --- a/arch/x86/kernel/paravirt_patch.c +++ b/arch/x86/kernel/paravirt_patch.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include #include @@ -124,3 +125,51 @@ unsigned int native_patch(u8 type, void *insn_buff, unsigned long addr, return paravirt_patch_default(type, insn_buff, addr, len); } + +#ifdef CONFIG_PARAVIRT_RUNTIME +/** + * runtime_patch - Generate patching code for a native/paravirt op + * @type: op type to generate code for + * @insn_buff: destination buffer + * @op: op target + * @addr: call site address + * @len: length of insn_buff + * + * Note that pv-ops are only suitable for runtime patching if they are + * non-preemptible. This is necessary for two reasons: we don't want to + * be overwriting insn sequences which might be referenced from call-stacks + * (and thus would be returned to), and we want patching to act as a barrier + * so no code from now stale paravirt ops should execute after an op has + * changed. + * + * Return: size of insn sequence on success, -EINVAL on error. + */ +int runtime_patch(u8 type, void *insn_buff, void *op, + unsigned long addr, unsigned int len) +{ + void *native_op; + int used = 0; + + /* Nothing whitelisted for now. */ + switch (type) { + default: + pr_warn("type=%d unsuitable for runtime-patching\n", type); + return -EINVAL; + } + + if (PARAVIRT_PATCH_OP(pv_ops, type) != (long)op) + PARAVIRT_PATCH_OP(pv_ops, type) = (long)op; + + native_op = (void *)PARAVIRT_PATCH_OP(native_pv_ops, type); + + /* + * Use native_patch() to get the right insns if we are switching + * back to a native_op. + */ + if (op == native_op) + used = native_patch(type, insn_buff, addr, len); + else + used = paravirt_patch_default(type, insn_buff, addr, len); + return used; +} +#endif /* CONFIG_PARAVIRT_RUNTIME */ diff --git a/include/linux/preempt.h b/include/linux/preempt.h index bc3f1aecaa19..c569d077aab2 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -203,6 +203,13 @@ do { \ __preempt_schedule(); \ } while (0) +/* + * preempt_enable_no_resched() so we don't add any preemption points until + * after the caller has returned. + */ +#define preempt_enable_runtime_patch() preempt_enable_no_resched() +#define preempt_disable_runtime_patch() preempt_disable() + #else /* !CONFIG_PREEMPTION */ #define preempt_enable() \ do { \ @@ -217,6 +224,12 @@ do { \ } while (0) #define preempt_check_resched() do { } while (0) + +/* + * NOP, if there's no preemption. + */ +#define preempt_disable_runtime_patch() do { } while (0) +#define preempt_enable_runtime_patch() do { } while (0) #endif /* CONFIG_PREEMPTION */ #define preempt_disable_notrace() \ @@ -250,6 +263,8 @@ do { \ #define preempt_enable_notrace() barrier() #define preemptible() 0 +#define preempt_disable_runtime_patch() do { } while (0) +#define preempt_enable_runtime_patch() do { } while (0) #endif /* CONFIG_PREEMPT_COUNT */ #ifdef MODULE @@ -260,6 +275,8 @@ do { \ #undef preempt_enable_no_resched #undef preempt_enable_no_resched_notrace #undef preempt_check_resched +#undef preempt_disable_runtime_patch +#undef preempt_enable_runtime_patch #endif #define preempt_set_need_resched() \ From patchwork Wed Apr 8 05:03:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479413 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F2375912 for ; Wed, 8 Apr 2020 05:07:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D30972083E for ; Wed, 8 Apr 2020 05:07:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="nMq6d/oD" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726523AbgDHFF3 (ORCPT ); Wed, 8 Apr 2020 01:05:29 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:38466 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726345AbgDHFF2 (ORCPT ); Wed, 8 Apr 2020 01:05:28 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853kcS191152; Wed, 8 Apr 2020 05:05:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=f3lI3rU1/Ay+Euh85CO5VhgXpZoiQo2cE1fiMRnLh1w=; b=nMq6d/oDWDkNt2ukCGxAciCrkLF9QtnmmkiNw9DRJnvr/apWes/2eAGiQcGqMSCJVtmo M6AtPpl2ujYjQf5RDs/ynkxYwT9oSpQgxsfAqkfYM3b0ktBcH6b5z2SyZWM2/IcvPU0Q ocfvrZ3Cx8jlOay/V59oyr80m0cPdV01ga90ZR0ZRoz2qZCBTheZFVMeTUWz4GOlnssG SSLKEhsbansJomlmhFH0Bu4iSxQ4+844Zl0p2Ssk3gqZecDCRH8tvncx2cWmCY+MBahS 9fFyUtYO5a4JeZxrmJbEfjc0ZKNjEFNDNmbY4C7ivOd/gfozUSL3n9ED7t1mddqW8YD0 aQ== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 3091m0s0sj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:16 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853LB1158807; Wed, 8 Apr 2020 05:05:15 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3030.oracle.com with ESMTP id 3091m01fvx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:15 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03855DaW015230; Wed, 8 Apr 2020 05:05:13 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:12 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 10/26] x86/paravirt: Add primitives to stage pv-ops Date: Tue, 7 Apr 2020 22:03:07 -0700 Message-Id: <20200408050323.4237-11-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 malwarescore=0 mlxlogscore=999 phishscore=0 spamscore=0 adultscore=0 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 phishscore=0 suspectscore=0 bulkscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 clxscore=1015 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add paravirt_stage_alt() which conditionally selects between a paravirt or native pv-op and then stages it for later patching. Signed-off-by: Ankur Arora --- arch/x86/include/asm/paravirt_types.h | 6 +++ arch/x86/include/asm/text-patching.h | 3 ++ arch/x86/kernel/alternative.c | 58 +++++++++++++++++++++++++++ 3 files changed, 67 insertions(+) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 3b9f6c105397..0c4ca7ad719c 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -350,6 +350,12 @@ extern struct paravirt_patch_template native_pv_ops; #define PARAVIRT_PATCH(x) \ (offsetof(struct paravirt_patch_template, x) / sizeof(void *)) +#define paravirt_stage_alt(do_stage, op, opfn) \ + (text_poke_pv_stage(PARAVIRT_PATCH(op), \ + (do_stage) ? (opfn) : (native_pv_ops.op))) + +#define paravirt_stage_zero() text_poke_pv_stage_zero() + /* * Neat trick to map patch type back to the call within the * corresponding structure. diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index e2ef241c261e..706e61e6967d 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -55,6 +55,9 @@ extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void extern void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate); extern void text_poke_finish(void); +bool text_poke_pv_stage(u8 type, void *opfn); +void text_poke_pv_stage_zero(void); + #define INT3_INSN_SIZE 1 #define INT3_INSN_OPCODE 0xCC diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 8189ac21624c..0c335af9ee28 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -1307,3 +1307,61 @@ void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void * text_poke_loc_init(&tp, addr, opcode, len, emulate); text_poke_bp_batch(&tp, 1); } + +#ifdef CONFIG_PARAVIRT_RUNTIME +struct paravirt_stage_entry { + void *dest; /* pv_op destination */ + u8 type; /* pv_op type */ +}; + +/* + * We don't anticipate many pv-ops being written at runtime. + */ +#define PARAVIRT_STAGE_MAX 8 +struct paravirt_stage { + struct paravirt_stage_entry ops[PARAVIRT_STAGE_MAX]; + u32 count; +}; + +/* Protected by text_mutex */ +static struct paravirt_stage pv_stage; + +/** + * text_poke_pv_stage - Stage paravirt-op for poking. + * @addr: address in struct paravirt_patch_template + * @type: pv-op type + * @opfn: destination of the pv-op + * + * Return: staging status. + */ +bool text_poke_pv_stage(u8 type, void *opfn) +{ + if (system_state == SYSTEM_BOOTING) { /* Passthrough */ + PARAVIRT_PATCH_OP(pv_ops, type) = (long)opfn; + goto out; + } + + lockdep_assert_held(&text_mutex); + + if (PARAVIRT_PATCH_OP(pv_ops, type) == (long)opfn) + goto out; + + if (pv_stage.count >= PARAVIRT_STAGE_MAX) + goto out; + + pv_stage.ops[pv_stage.count].type = type; + pv_stage.ops[pv_stage.count].dest = opfn; + + pv_stage.count++; + + return true; +out: + return false; +} + +void text_poke_pv_stage_zero(void) +{ + lockdep_assert_held(&text_mutex); + pv_stage.count = 0; +} +#endif /* CONFIG_PARAVIRT_RUNTIME */ From patchwork Wed Apr 8 05:03:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479377 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 12A6592C for ; Wed, 8 Apr 2020 05:07:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E62EB20857 for ; Wed, 8 Apr 2020 05:06:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="xWXrvfUd" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726587AbgDHFFa (ORCPT ); Wed, 8 Apr 2020 01:05:30 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:38010 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726409AbgDHFF3 (ORCPT ); Wed, 8 Apr 2020 01:05:29 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853GBi179620; Wed, 8 Apr 2020 05:05:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=kG41Wdi3xZl8bLDCQdMVBGXBcfy0aVqVcUlxPKtO2Qo=; b=xWXrvfUdB9tuJuQ2V9qdp8V5Bz/bz2ImxUQMZguPeBGwXxeXQYy1NSxRzY5PefUFgpCi DE348mVzKuUk1tMCueBWHccBpShi3itMcbUNJlD7HQhzAmsa08nAPhzy6973ZKTzTfVQ WbbNP2PNksH1Xt5HpceILIVsYBjThZn8GmaQPXJ7/cVAn3dWpMcXfJ+nzcncLVFMkK/s LunZLiI+/yDjmOTe811yfyRavy6AoS80i5en5ECYf7zgvFKOwGItBp1etljPK9ypTOOE WlgQIgH+NA20KP2M/zAXvrlZiP8WfH2UQlOeF4HKb5cnflfwZj4/IurCxH++yE9rr9eR uA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 3091mnh14d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:16 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853Lv7159055; Wed, 8 Apr 2020 05:05:16 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 3091m01fwa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:16 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03855FqY030510; Wed, 8 Apr 2020 05:05:15 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:14 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 11/26] x86/alternatives: Remove return value of text_poke*() Date: Tue, 7 Apr 2020 22:03:08 -0700 Message-Id: <20200408050323.4237-12-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 malwarescore=0 mlxlogscore=813 phishscore=0 spamscore=0 adultscore=0 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 mlxlogscore=873 mlxscore=0 priorityscore=1501 bulkscore=0 adultscore=0 impostorscore=0 phishscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Various text_poke() variants don't return a useful value. Remove it. Signed-off-by: Ankur Arora --- arch/x86/include/asm/text-patching.h | 4 ++-- arch/x86/kernel/alternative.c | 11 +++++------ 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index 706e61e6967d..04778c2bc34e 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -46,9 +46,9 @@ extern void text_poke_early(void *addr, const void *opcode, size_t len); * On the local CPU you need to be protected against NMI or MCE handlers seeing * an inconsistent instruction while you patch. */ -extern void *text_poke(void *addr, const void *opcode, size_t len); +extern void text_poke(void *addr, const void *opcode, size_t len); extern void text_poke_sync(void); -extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len); +extern void text_poke_kgdb(void *addr, const void *opcode, size_t len); extern int poke_int3_handler(struct pt_regs *regs); extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate); diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 0c335af9ee28..8c79a3dc5e72 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -805,7 +805,7 @@ void __init_or_module text_poke_early(void *addr, const void *opcode, __ro_after_init struct mm_struct *poking_mm; __ro_after_init unsigned long poking_addr; -static void *__text_poke(void *addr, const void *opcode, size_t len) +static void __text_poke(void *addr, const void *opcode, size_t len) { bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE; struct page *pages[2] = {NULL}; @@ -906,7 +906,6 @@ static void *__text_poke(void *addr, const void *opcode, size_t len) pte_unmap_unlock(ptep, ptl); local_irq_restore(flags); - return addr; } /** @@ -925,11 +924,11 @@ static void *__text_poke(void *addr, const void *opcode, size_t len) * by registering a module notifier, and ordering module removal and patching * trough a mutex. */ -void *text_poke(void *addr, const void *opcode, size_t len) +void text_poke(void *addr, const void *opcode, size_t len) { lockdep_assert_held(&text_mutex); - return __text_poke(addr, opcode, len); + __text_poke(addr, opcode, len); } /** @@ -946,9 +945,9 @@ void *text_poke(void *addr, const void *opcode, size_t len) * Context: should only be used by kgdb, which ensures no other core is running, * despite the fact it does not hold the text_mutex. */ -void *text_poke_kgdb(void *addr, const void *opcode, size_t len) +void text_poke_kgdb(void *addr, const void *opcode, size_t len) { - return __text_poke(addr, opcode, len); + __text_poke(addr, opcode, len); } static void do_sync_core(void *info) From patchwork Wed Apr 8 05:03:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479325 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5EC2592C for ; Wed, 8 Apr 2020 05:05:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 332A02083E for ; Wed, 8 Apr 2020 05:05:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="vI5EI8HO" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726650AbgDHFFd (ORCPT ); Wed, 8 Apr 2020 01:05:33 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:52586 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726575AbgDHFFa (ORCPT ); Wed, 8 Apr 2020 01:05:30 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03854h69013027; Wed, 8 Apr 2020 05:05:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=8QxNSgJ0TIVxhxX6u30CrqzHaWwQp1Js8M95AGUtpXc=; b=vI5EI8HOwXwI5eQetVqe1UeC1FDR5h2Wq2Yc4V02ynbY+QJ5s5gshLf87NcS/xzV5QOo Gxmf6Vle0gmY8WBM7etujGZBUqGtXtK1X6cQzgXXz0qUt/rOZqrM4I7h/EmNpLtA8kVU rXTBt/ChGhsl6vAhJJASdRwQwK0zmTX9YsmKvblCa/Y0h7xD3BPSovfFcl/75PqYbkD7 gBYSc9aJ+v3Rzov2jFcgMCyVPXAvnQL5E3hd1TQJY93DhmCkgtqATrczQaTw63J/BWNy UpTY/JGIya+qIoWXiyPI34jM69Qs0NZVo5Y/v2bge1U6jQZnHztNJNjgbCHjywWIB7PE /A== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 3091m390xf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:18 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03852XTl062344; Wed, 8 Apr 2020 05:05:17 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3020.oracle.com with ESMTP id 3091mh1kmn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:17 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03855GdY007447; Wed, 8 Apr 2020 05:05:16 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:16 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 12/26] x86/alternatives: Use __get_unlocked_pte() in text_poke() Date: Tue, 7 Apr 2020 22:03:09 -0700 Message-Id: <20200408050323.4237-13-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 bulkscore=0 mlxlogscore=702 mlxscore=0 phishscore=0 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 adultscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=763 priorityscore=1501 clxscore=1015 bulkscore=0 phishscore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org text_poke() uses get_locked_pte() to map poking_addr. However, this introduces a dependency on locking code which precludes using text_poke() to modify qspinlock primitives. Accesses to this pte (and poking_addr) are protected by text_mutex so we can safely switch to __get_unlocked_pte() here. Note that we do need to be careful that we do not try to modify the poking_addr from multiple contexts simultaneously (ex. INT3 or NMI context.) Signed-off-by: Ankur Arora --- arch/x86/kernel/alternative.c | 9 ++++----- include/linux/mm.h | 16 ++++++++++++++-- mm/memory.c | 9 ++++++--- 3 files changed, 24 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 8c79a3dc5e72..0344e49a4ade 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -812,7 +812,6 @@ static void __text_poke(void *addr, const void *opcode, size_t len) temp_mm_state_t prev; unsigned long flags; pte_t pte, *ptep; - spinlock_t *ptl; pgprot_t pgprot; /* @@ -846,10 +845,11 @@ static void __text_poke(void *addr, const void *opcode, size_t len) pgprot = __pgprot(pgprot_val(PAGE_KERNEL) & ~_PAGE_GLOBAL); /* - * The lock is not really needed, but this allows to avoid open-coding. + * text_poke() might be used to poke spinlock primitives so do this + * unlocked. This does mean that we need to be careful that no other + * context (ex. INT3 handler) is simultaneously writing to this pte. */ - ptep = get_locked_pte(poking_mm, poking_addr, &ptl); - + ptep = __get_unlocked_pte(poking_mm, poking_addr); /* * This must not fail; preallocated in poking_init(). */ @@ -904,7 +904,6 @@ static void __text_poke(void *addr, const void *opcode, size_t len) */ BUG_ON(memcmp(addr, opcode, len)); - pte_unmap_unlock(ptep, ptl); local_irq_restore(flags); } diff --git a/include/linux/mm.h b/include/linux/mm.h index 7dd5c4ccbf85..d4a652c2e269 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1895,8 +1895,20 @@ static inline int pte_devmap(pte_t pte) int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot); -extern pte_t *__get_locked_pte(struct mm_struct *mm, unsigned long addr, - spinlock_t **ptl); +pte_t *__get_pte(struct mm_struct *mm, unsigned long addr, spinlock_t **ptl); + +static inline pte_t *__get_unlocked_pte(struct mm_struct *mm, + unsigned long addr) +{ + return __get_pte(mm, addr, NULL); +} + +static inline pte_t *__get_locked_pte(struct mm_struct *mm, + unsigned long addr, spinlock_t **ptl) +{ + return __get_pte(mm, addr, ptl); +} + static inline pte_t *get_locked_pte(struct mm_struct *mm, unsigned long addr, spinlock_t **ptl) { diff --git a/mm/memory.c b/mm/memory.c index 586271f3efc6..7acfe9512084 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1407,8 +1407,8 @@ void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address, } EXPORT_SYMBOL_GPL(zap_vma_ptes); -pte_t *__get_locked_pte(struct mm_struct *mm, unsigned long addr, - spinlock_t **ptl) +pte_t *__get_pte(struct mm_struct *mm, unsigned long addr, + spinlock_t **ptl) { pgd_t *pgd; p4d_t *p4d; @@ -1427,7 +1427,10 @@ pte_t *__get_locked_pte(struct mm_struct *mm, unsigned long addr, return NULL; VM_BUG_ON(pmd_trans_huge(*pmd)); - return pte_alloc_map_lock(mm, pmd, addr, ptl); + if (likely(ptl)) + return pte_alloc_map_lock(mm, pmd, addr, ptl); + else + return pte_alloc_map(mm, pmd, addr); } /* From patchwork Wed Apr 8 05:03:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479417 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8973F912 for ; Wed, 8 Apr 2020 05:07:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5E44B20771 for ; Wed, 8 Apr 2020 05:07:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="rRyPLGp4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726855AbgDHFHc (ORCPT ); Wed, 8 Apr 2020 01:07:32 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:54140 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727349AbgDHFHc (ORCPT ); Wed, 8 Apr 2020 01:07:32 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 038553bq013448; Wed, 8 Apr 2020 05:07:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=jtenwkhxJwQP2uq2B69E8Rt6lQLXw7OMtcgth/iQeLE=; b=rRyPLGp4AV4NTgpv3HbnBcQ7xi+6GgDMb68GyvXgeEhCjSMF6ysWqwYzf7YpGW/uJiQW Fl8krSNYEc/p+LC9gyJHE1S5WR1iuo0vMrP3nJHdB8saF1Jw3mo2oqfIgWRhYKvPF1p7 Uw2FQ74MmTPitiYxcvgIQ0kbhOLcwOuoFK5RFTT3vpoyc/2D0JHD7bMk8ZvhhaJGCrSm wC7lni75F9GI7ehsae7NyFMzWCwJjHBVrugyTPdblSdk89Z5pO5iQ3RDD2uiPIcdPfTJ ktNEWQUDZ2W1eLam26fvFBy1P/ka5AIDkRC8dGg8z918WC1UQ3hSAKntgBBIJe8Lk9lZ sg== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 3091m39155-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:07:19 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03851WCB100753; Wed, 8 Apr 2020 05:05:18 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3020.oracle.com with ESMTP id 3091m2hv20-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:18 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03855H45007452; Wed, 8 Apr 2020 05:05:17 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:17 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 13/26] x86/alternatives: Split __text_poke() Date: Tue, 7 Apr 2020 22:03:10 -0700 Message-Id: <20200408050323.4237-14-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 mlxscore=0 malwarescore=0 spamscore=0 adultscore=0 suspectscore=0 mlxlogscore=873 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 adultscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=934 priorityscore=1501 clxscore=1015 bulkscore=0 phishscore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Separate __text_poke() into map, memcpy and unmap portions, (__text_poke_map(), __text_do_poke() and __text_poke_unmap().) Do this to separate the non-reentrant bits from the reentrant __text_do_poke(). __text_poke_map()/_unmap() modify poking_mm, poking_addr and do the pte-mapping and thus are non-reentrant. This allows __text_do_poke() to be safely called from an INT3 context with __text_poke_map()/_unmap() being called at the start and the end of the patching of a call-site instead of doing that for each stage of the three patching stages. Signed-off-by: Ankur Arora --- arch/x86/kernel/alternative.c | 46 +++++++++++++++++++++++++---------- 1 file changed, 33 insertions(+), 13 deletions(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 0344e49a4ade..337aad8c2521 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -805,13 +805,12 @@ void __init_or_module text_poke_early(void *addr, const void *opcode, __ro_after_init struct mm_struct *poking_mm; __ro_after_init unsigned long poking_addr; -static void __text_poke(void *addr, const void *opcode, size_t len) +static void __text_poke_map(void *addr, size_t len, + temp_mm_state_t *prev_mm, pte_t **ptep) { bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE; struct page *pages[2] = {NULL}; - temp_mm_state_t prev; - unsigned long flags; - pte_t pte, *ptep; + pte_t pte; pgprot_t pgprot; /* @@ -836,8 +835,6 @@ static void __text_poke(void *addr, const void *opcode, size_t len) */ BUG_ON(!pages[0] || (cross_page_boundary && !pages[1])); - local_irq_save(flags); - /* * Map the page without the global bit, as TLB flushing is done with * flush_tlb_mm_range(), which is intended for non-global PTEs. @@ -849,30 +846,42 @@ static void __text_poke(void *addr, const void *opcode, size_t len) * unlocked. This does mean that we need to be careful that no other * context (ex. INT3 handler) is simultaneously writing to this pte. */ - ptep = __get_unlocked_pte(poking_mm, poking_addr); + *ptep = __get_unlocked_pte(poking_mm, poking_addr); /* * This must not fail; preallocated in poking_init(). */ - VM_BUG_ON(!ptep); + VM_BUG_ON(!*ptep); pte = mk_pte(pages[0], pgprot); - set_pte_at(poking_mm, poking_addr, ptep, pte); + set_pte_at(poking_mm, poking_addr, *ptep, pte); if (cross_page_boundary) { pte = mk_pte(pages[1], pgprot); - set_pte_at(poking_mm, poking_addr + PAGE_SIZE, ptep + 1, pte); + set_pte_at(poking_mm, poking_addr + PAGE_SIZE, *ptep + 1, pte); } /* * Loading the temporary mm behaves as a compiler barrier, which * guarantees that the PTE will be set at the time memcpy() is done. */ - prev = use_temporary_mm(poking_mm); + *prev_mm = use_temporary_mm(poking_mm); +} +/* + * Do the actual poke. Needs to be re-entrant as this can be called + * via INT3 context as well. + */ +static void __text_do_poke(unsigned long offset, const void *opcode, size_t len) +{ kasan_disable_current(); - memcpy((u8 *)poking_addr + offset_in_page(addr), opcode, len); + memcpy((u8 *)poking_addr + offset, opcode, len); kasan_enable_current(); +} +static void __text_poke_unmap(void *addr, const void *opcode, size_t len, + temp_mm_state_t *prev_mm, pte_t *ptep) +{ + bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE; /* * Ensure that the PTE is only cleared after the instructions of memcpy * were issued by using a compiler barrier. @@ -888,7 +897,7 @@ static void __text_poke(void *addr, const void *opcode, size_t len) * instruction that already allows the core to see the updated version. * Xen-PV is assumed to serialize execution in a similar manner. */ - unuse_temporary_mm(prev); + unuse_temporary_mm(*prev_mm); /* * Flushing the TLB might involve IPIs, which would require enabled @@ -903,7 +912,18 @@ static void __text_poke(void *addr, const void *opcode, size_t len) * fundamentally screwy; there's nothing we can really do about that. */ BUG_ON(memcmp(addr, opcode, len)); +} +static void __text_poke(void *addr, const void *opcode, size_t len) +{ + temp_mm_state_t prev_mm; + unsigned long flags; + pte_t *ptep; + + local_irq_save(flags); + __text_poke_map(addr, len, &prev_mm, &ptep); + __text_do_poke(offset_in_page(addr), opcode, len); + __text_poke_unmap(addr, opcode, len, &prev_mm, ptep); local_irq_restore(flags); } From patchwork Wed Apr 8 05:03:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479353 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C3587912 for ; Wed, 8 Apr 2020 05:06:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 973F720771 for ; Wed, 8 Apr 2020 05:06:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Rt+HodiY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727051AbgDHFGg (ORCPT ); Wed, 8 Apr 2020 01:06:36 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:38082 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726680AbgDHFFf (ORCPT ); Wed, 8 Apr 2020 01:05:35 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853CfU179598; Wed, 8 Apr 2020 05:05:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=632WiNiqidlUZOdyno4pYxOiJK2oE9HYRPcYujYQMXU=; b=Rt+HodiYBj5Vb8Zcpm5nlLZpKAa6m/3Me62if7vHull0ZIhUTtFjGOAn/+tC+6LZx/s5 fWpl9LbV8Yuyja7tnLkLiXAHctrL58Ve4BTXafjLRAM/XX3zy8q/UIj2/ReIGVT2IBuH oM+WGXSm1tZkS2XoxZpFTkj1VIebdFyTCIcZnizTL6m+tmMN2LmnT2VJ3tPL9TziklGz EAHBitysknFBJmR4mH6Jh/4m9KSZ7GiZiN4mFr57C7s3/0vzhRjmK/pDDi8bZjTJ/kfo Vvh1eEiPIFAbNsqCP8LamGtP0+lgmmfW13oBvy75XnqPHLHyzzKUIQi9as99IYUqMZjr zQ== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2120.oracle.com with ESMTP id 3091mnh14q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:21 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03851Wbv100720; Wed, 8 Apr 2020 05:05:21 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3020.oracle.com with ESMTP id 3091m2hv5d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:20 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03855Jem015242; Wed, 8 Apr 2020 05:05:19 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:18 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 14/26] x86/alternatives: Handle native insns in text_poke_loc*() Date: Tue, 7 Apr 2020 22:03:11 -0700 Message-Id: <20200408050323.4237-15-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 mlxscore=0 malwarescore=0 spamscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 bulkscore=0 adultscore=0 impostorscore=0 phishscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Intended to handle scenarios where we might want to patch arbitrary instructions (ex. inlined opcodes in pv_lock_ops.) Users for native mode (as opposed to emulated) are introduced in later patches. Signed-off-by: Ankur Arora --- arch/x86/include/asm/text-patching.h | 4 +- arch/x86/kernel/alternative.c | 61 ++++++++++++++++++++-------- 2 files changed, 45 insertions(+), 20 deletions(-) diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index 04778c2bc34e..c4b2814f2f9d 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -25,10 +25,10 @@ static inline void apply_paravirt(struct paravirt_patch_site *start, /* * Currently, the max observed size in the kernel code is - * JUMP_LABEL_NOP_SIZE/RELATIVEJUMP_SIZE, which are 5. + * NOP7 for indirect call, which is 7. * Raise it if needed. */ -#define POKE_MAX_OPCODE_SIZE 5 +#define POKE_MAX_OPCODE_SIZE 7 extern void text_poke_early(void *addr, const void *opcode, size_t len); diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 337aad8c2521..004fe86f463f 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -981,8 +981,15 @@ void text_poke_sync(void) struct text_poke_loc { s32 rel_addr; /* addr := _stext + rel_addr */ - s32 rel32; - u8 opcode; + union { + struct { + s32 rel32; + u8 opcode; + } emulated; + struct { + u8 len; + } native; + }; const u8 text[POKE_MAX_OPCODE_SIZE]; }; @@ -990,6 +997,7 @@ struct bp_patching_desc { struct text_poke_loc *vec; int nr_entries; atomic_t refs; + bool native; }; static struct bp_patching_desc *bp_desc; @@ -1071,10 +1079,13 @@ int notrace poke_int3_handler(struct pt_regs *regs) goto out_put; } - len = text_opcode_size(tp->opcode); + if (desc->native) + BUG(); + + len = text_opcode_size(tp->emulated.opcode); ip += len; - switch (tp->opcode) { + switch (tp->emulated.opcode) { case INT3_INSN_OPCODE: /* * Someone poked an explicit INT3, they'll want to handle it, @@ -1083,12 +1094,12 @@ int notrace poke_int3_handler(struct pt_regs *regs) goto out_put; case CALL_INSN_OPCODE: - int3_emulate_call(regs, (long)ip + tp->rel32); + int3_emulate_call(regs, (long)ip + tp->emulated.rel32); break; case JMP32_INSN_OPCODE: case JMP8_INSN_OPCODE: - int3_emulate_jmp(regs, (long)ip + tp->rel32); + int3_emulate_jmp(regs, (long)ip + tp->emulated.rel32); break; default: @@ -1134,6 +1145,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries .vec = tp, .nr_entries = nr_entries, .refs = ATOMIC_INIT(1), + .native = false, }; unsigned char int3 = INT3_INSN_OPCODE; unsigned int i; @@ -1161,7 +1173,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries * Second step: update all but the first byte of the patched range. */ for (do_sync = 0, i = 0; i < nr_entries; i++) { - int len = text_opcode_size(tp[i].opcode); + int len = text_opcode_size(tp[i].emulated.opcode); if (len - INT3_INSN_SIZE > 0) { text_poke(text_poke_addr(&tp[i]) + INT3_INSN_SIZE, @@ -1205,11 +1217,25 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries } static void text_poke_loc_init(struct text_poke_loc *tp, void *addr, - const void *opcode, size_t len, const void *emulate) + const void *opcode, size_t len, + const void *emulate, bool native) { struct insn insn; + memset((void *)tp, 0, sizeof(*tp)); memcpy((void *)tp->text, opcode, len); + + tp->rel_addr = addr - (void *)_stext; + + /* + * Native mode: when we might be poking + * arbitrary (perhaps) multiple instructions. + */ + if (native) { + tp->native.len = (u8)len; + return; + } + if (!emulate) emulate = opcode; @@ -1219,31 +1245,30 @@ static void text_poke_loc_init(struct text_poke_loc *tp, void *addr, BUG_ON(!insn_complete(&insn)); BUG_ON(len != insn.length); - tp->rel_addr = addr - (void *)_stext; - tp->opcode = insn.opcode.bytes[0]; + tp->emulated.opcode = insn.opcode.bytes[0]; - switch (tp->opcode) { + switch (tp->emulated.opcode) { case INT3_INSN_OPCODE: break; case CALL_INSN_OPCODE: case JMP32_INSN_OPCODE: case JMP8_INSN_OPCODE: - tp->rel32 = insn.immediate.value; + tp->emulated.rel32 = insn.immediate.value; break; default: /* assume NOP */ switch (len) { case 2: /* NOP2 -- emulate as JMP8+0 */ BUG_ON(memcmp(emulate, ideal_nops[len], len)); - tp->opcode = JMP8_INSN_OPCODE; - tp->rel32 = 0; + tp->emulated.opcode = JMP8_INSN_OPCODE; + tp->emulated.rel32 = 0; break; case 5: /* NOP5 -- emulate as JMP32+0 */ BUG_ON(memcmp(emulate, ideal_nops[NOP_ATOMIC5], len)); - tp->opcode = JMP32_INSN_OPCODE; - tp->rel32 = 0; + tp->emulated.opcode = JMP32_INSN_OPCODE; + tp->emulated.rel32 = 0; break; default: /* unknown instruction */ @@ -1299,7 +1324,7 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi text_poke_flush(addr); tp = &tp_vec[tp_vec_nr++]; - text_poke_loc_init(tp, addr, opcode, len, emulate); + text_poke_loc_init(tp, addr, opcode, len, emulate, false); } /** @@ -1322,7 +1347,7 @@ void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void * return; } - text_poke_loc_init(&tp, addr, opcode, len, emulate); + text_poke_loc_init(&tp, addr, opcode, len, emulate, false); text_poke_bp_batch(&tp, 1); } From patchwork Wed Apr 8 05:03:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479329 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B5CE6912 for ; Wed, 8 Apr 2020 05:05:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 93C292083E for ; Wed, 8 Apr 2020 05:05:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="auE3HQCW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726725AbgDHFFh (ORCPT ); Wed, 8 Apr 2020 01:05:37 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:38574 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726689AbgDHFFg (ORCPT ); Wed, 8 Apr 2020 01:05:36 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853ldS191183; Wed, 8 Apr 2020 05:05:22 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=BeFWvYA7XIlpPY3jCn3WOorYd4LJtWrT9UyxM8q4o68=; b=auE3HQCWrQ+mTWzjzKVzofN4aU72M02C0tq5tNvEj4eZaVcWC6iqezU+lJkUo5962rlA 4ZF60eMjEPMJ/A+8xOPnVl/EcZMjVmy4P1EQt2LIUz1XimSSpXMqTTHEoyRPWWueCrSf dtyPCZWNXf9wQpZMuGhmod9YAnk8TApPTY89QWpUzPyBn9D9WOLqI17P+nDESrlNKk6o 8EzKhFq6WMDGbcm8hkad6UGs+rc7rm+V1z6dR9vdiS1flnJyUIY05WrkQ4GntkRrYyB3 U2JU5+ly+cjPGtpBoV6957nPGgXe4v3R7Y0Qx/scVGjvo/UwzNxEdXC/XsDjF6GrmEoA /A== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 3091m0s0su-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:22 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03852YOP062381; Wed, 8 Apr 2020 05:05:21 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3020.oracle.com with ESMTP id 3091mh1kq2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:21 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03855KuN030519; Wed, 8 Apr 2020 05:05:20 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:20 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 15/26] x86/alternatives: Non-emulated text poking Date: Tue, 7 Apr 2020 22:03:12 -0700 Message-Id: <20200408050323.4237-16-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 bulkscore=0 mlxlogscore=999 mlxscore=0 phishscore=0 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 phishscore=0 suspectscore=0 bulkscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 clxscore=1015 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Patching at runtime needs to handle interdependent pv-ops: as an example, lock.queued_lock_slowpath(), lock.queued_lock_unlock() and the other pv_lock_ops are paired and so need to be updated atomically. This is difficult with emulation because non-patching CPUs could be executing in critical sections. (We could apply INT3 everywhere first and then use RCU to force a barrier but given that spinlocks are everywhere, it still might mean a lot of time in emulation.) Second, locking operations can be called from interrupt handlers which means we cannot trivially use IPIs to introduce a pipeline sync step on non-patching CPUs. Third, some pv-ops can be inlined and so we would need to emulate a broader set of operations than CALL, JMP, NOP*. Introduce the core state-machine with the actual poking and pipeline sync stubbed out. This executes via stop_machine() with the primary CPU carrying out a text_poke_bp() style three-staged algorithm. The control flow diagram below shows CPU0 as the primary which does the patching, while the rest of the CPUs (CPUx) execute the sync loop in text_poke_sync_finish(). CPU0 CPUx ---- ---- patch_worker() patch_worker() /* Traversal, insn-gen */ text_poke_sync_finish() tps.patch_worker() /* * wait until: /* for each patch-site */ * tps->state == PATCH_DONE text_poke_site() */ poke_sync() ... ... smp_store_release(&tps->state, PATCH_DONE) Commits further on flesh out the rest of the code. Signed-off-by: Ankur Arora --- sync_one() uses the following for pipeline synchronization: + if (in_nmi()) + cpuid_eax(1); + else + sync_core(); The if (in_nmi()) clause is meant to be executed from NMI contexts. Reading through past LKML discussions cpuid_eax() is probably a bad choice -- at least in so far as Xen PV is concerned. What would be a good primitive to use insead? Also, given that we do handle the nested NMI case, does it make sense to just use native_iret() (via sync_core()) in NMI contexts well? --- arch/x86/kernel/alternative.c | 247 ++++++++++++++++++++++++++++++++++ 1 file changed, 247 insertions(+) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 004fe86f463f..452d4081eded 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -979,6 +979,26 @@ void text_poke_sync(void) on_each_cpu(do_sync_core, NULL, 1); } +static void __maybe_unused sync_one(void) +{ + /* + * We might be executing in NMI context, and so cannot use + * IRET as a synchronizing instruction. + * + * We could use native_write_cr2() but that is not guaranteed + * to work on Xen-PV -- it is emulated by Xen and might not + * execute an iret (or similar synchronizing instruction) + * internally. + * + * cpuid() would trap as well. Unclear if that's a solution + * either. + */ + if (in_nmi()) + cpuid_eax(1); + else + sync_core(); +} + struct text_poke_loc { s32 rel_addr; /* addr := _stext + rel_addr */ union { @@ -1351,6 +1371,233 @@ void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void * text_poke_bp_batch(&tp, 1); } +struct text_poke_state; +typedef void (*patch_worker_t)(struct text_poke_state *tps); + +/* + * +-----------possible-BP----------+ + * | | + * +--write-INT3--+ +--suffix--+ +-insn-prefix-+ + * / | _/ |__/ | + * / v' v v + * PATCH_SYNC_0 PATCH_SYNC_1 PATCH_SYNC_2 *PATCH_SYNC_DONE* + * \ |`----> PATCH_DONE + * `----------<---------<---------<---------<----------+ + * + * We start in state PATCH_SYNC_DONE and loop through PATCH_SYNC_* states + * to end at PATCH_DONE. The primary drives these in text_poke_site() + * with patch_worker() making the final transition to PATCH_DONE. + * All transitions but the last iteration need to be globally observed. + * + * On secondary CPUs, text_poke_sync_finish() waits in a cpu_relax() + * loop waiting for a transition to PATCH_SYNC_0 at which point it would + * start observing transitions until PATCH_SYNC_DONE. + * Eventually the master moves to PATCH_DONE and secondary CPUs finish. + */ +enum patch_state { + /* + * Add an artificial state that we can do a bitwise operation + * over all the PATCH_SYNC_* states. + */ + PATCH_SYNC_x = 4, + PATCH_SYNC_0 = PATCH_SYNC_x | 0, /* Serialize INT3 */ + PATCH_SYNC_1 = PATCH_SYNC_x | 1, /* Serialize rest */ + PATCH_SYNC_2 = PATCH_SYNC_x | 2, /* Serialize first opcode */ + PATCH_SYNC_DONE = PATCH_SYNC_x | 3, /* Site done, and start state */ + + PATCH_DONE = 8, /* End state */ +}; + +/* + * State for driving text-poking via stop_machine(). + */ +struct text_poke_state { + /* Whatever we are poking */ + void *stage; + + /* Modules to be processed. */ + struct list_head *head; + + /* + * Accesses to sync_ack_map are ordered by the primary + * via tps.state. + */ + struct cpumask sync_ack_map; + + /* + * Generates insn sequences for call-sites to be patched and + * calls text_poke_site() to do the actual poking. + */ + patch_worker_t patch_worker; + + /* + * Where are we in the patching state-machine. + */ + enum patch_state state; + + unsigned int primary_cpu; /* CPU doing the patching. */ + unsigned int num_acks; /* Number of Acks needed. */ +}; + +static struct text_poke_state text_poke_state; + +/** + * poke_sync() - transitions to the specified state. + * + * @tps - struct text_poke_state * + * @state - one of PATCH_SYNC_* states + * @offset - offset to be patched + * @insns - insns to write + * @len - length of insn sequence + */ +static void poke_sync(struct text_poke_state *tps, int state, int offset, + const char *insns, int len) +{ + /* + * STUB: no patching or synchronization, just go through the + * motions. + */ + smp_store_release(&tps->state, state); +} + +/** + * text_poke_site() - called on the primary to patch a single call site. + * + * Returns after switching tps->state to PATCH_SYNC_DONE. + */ +static void __maybe_unused text_poke_site(struct text_poke_state *tps, + struct text_poke_loc *tp) +{ + const unsigned char int3 = INT3_INSN_OPCODE; + temp_mm_state_t prev_mm; + pte_t *ptep; + int offset; + + __text_poke_map(text_poke_addr(tp), tp->native.len, &prev_mm, &ptep); + + offset = offset_in_page(text_poke_addr(tp)); + + /* + * All secondary CPUs are waiting in tps->state == PATCH_SYNC_DONE + * to move to PATCH_SYNC_0. Poke the INT3 and wait until all CPUs + * are known to have observed PATCH_SYNC_0. + * + * The earliest we can hit an INT3 is just after the first poke. + */ + poke_sync(tps, PATCH_SYNC_0, offset, &int3, INT3_INSN_SIZE); + + /* Poke remaining */ + poke_sync(tps, PATCH_SYNC_1, offset + INT3_INSN_SIZE, + tp->text + INT3_INSN_SIZE, tp->native.len - INT3_INSN_SIZE); + + /* + * Replace the INT3 with the first opcode and force the serializing + * instruction for the last time. Any secondaries in the BP + * handler should be able to move past the INT3 handler after this. + * (See poke_int3_native() for details on this.) + */ + poke_sync(tps, PATCH_SYNC_2, offset, tp->text, INT3_INSN_SIZE); + + /* + * Force all CPUS to observe PATCH_SYNC_DONE (in the BP handler or + * in text_poke_site()), so they know that this iteration is done + * and it is safe to exit the wait-until-a-sync-is-required loop. + */ + poke_sync(tps, PATCH_SYNC_DONE, 0, NULL, 0); + + /* + * Unmap the poking_addr, poking_mm. + */ + __text_poke_unmap(text_poke_addr(tp), tp->text, tp->native.len, + &prev_mm, ptep); +} + +/** + * text_poke_sync_finish() -- called to synchronize the CPU pipeline + * on secondary CPUs for all patch sites. + * + * Called in thread context with tps->state == PATCH_SYNC_DONE. + * Returns with tps->state == PATCH_DONE. + */ +static void text_poke_sync_finish(struct text_poke_state *tps) +{ + while (true) { + enum patch_state state; + + state = READ_ONCE(tps->state); + + /* + * We aren't doing any actual poking yet, so we don't + * handle any other states. + */ + if (state == PATCH_DONE) + break; + + /* + * Relax here while the primary makes up its mind on + * whether it is done or not. + */ + cpu_relax(); + } +} + +static int patch_worker(void *t) +{ + int cpu = smp_processor_id(); + struct text_poke_state *tps = t; + + if (cpu == tps->primary_cpu) { + /* + * Generates insns and calls text_poke_site() to do the poking + * and sync. + */ + tps->patch_worker(tps); + + /* + * We are done patching. Switch the state to PATCH_DONE + * so the secondaries can exit. + */ + smp_store_release(&tps->state, PATCH_DONE); + } else { + /* Secondary CPUs spin in a sync_core() state-machine. */ + text_poke_sync_finish(tps); + } + return 0; +} + +/** + * text_poke_late() -- late patching via stop_machine(). + * + * Called holding the text_mutex. + * + * Return: 0 on success, -errno on failure. + */ +static int __maybe_unused text_poke_late(patch_worker_t worker, void *stage) +{ + int ret; + + lockdep_assert_held(&text_mutex); + + if (system_state != SYSTEM_RUNNING) + return -EINVAL; + + text_poke_state.stage = stage; + text_poke_state.num_acks = cpumask_weight(cpu_online_mask); + text_poke_state.head = &alt_modules; + + text_poke_state.patch_worker = worker; + text_poke_state.state = PATCH_SYNC_DONE; /* Start state */ + text_poke_state.primary_cpu = smp_processor_id(); + + /* + * Run the worker on all online CPUs. Don't need to do anything + * for offline CPUs as they come back online with a clean cache. + */ + ret = stop_machine(patch_worker, &text_poke_state, cpu_online_mask); + + return ret; +} + #ifdef CONFIG_PARAVIRT_RUNTIME struct paravirt_stage_entry { void *dest; /* pv_op destination */ From patchwork Wed Apr 8 05:03:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479327 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 89FDA112C for ; Wed, 8 Apr 2020 05:05:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 67EA52083E for ; Wed, 8 Apr 2020 05:05:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="TQKqFBIT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726712AbgDHFFf (ORCPT ); Wed, 8 Apr 2020 01:05:35 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:38072 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726575AbgDHFFf (ORCPT ); Wed, 8 Apr 2020 01:05:35 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 038543Ku180070; Wed, 8 Apr 2020 05:05:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=9yuxA5IXffsHO48sJ1wSgMyLJZle9AjoqPIszA4NVpU=; b=TQKqFBITBOYFQPzrNmpK8lt+fAndAJZKKrFX4BoBfb3k0yVhyuTFG+bMvg1kjTKqqkWy 0No7sGhvNNEA4WExF8iTf8oZ4ynJpKO7oVj5T82k4fJ4H4eiDDhx2ii1PaDE/ZJkjuAG 9XqgH0YcYoLaJCMHDEPIUridLwfcI/eI7UrondSX711ga15j3ojbzxmjOeqxrhwCzrCb 2EKHdqYbPZh8gkWroa37MP20onJ8TiMCnXkNm6oPl0xnrjcZACdDeECz4ERk5QF7757e aLRLRT0ZzSH3iWAPQ0uPPJsPOY8PD22Z+N4NiVmO/qZq/FSZJwwkC3Mn2Wbi5Tsx9M7I DQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 3091mnh14t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:23 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03852XvE062261; Wed, 8 Apr 2020 05:05:23 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 3091mh1kr2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:22 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03855LAJ022165; Wed, 8 Apr 2020 05:05:21 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:21 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 16/26] x86/alternatives: Add paravirt patching at runtime Date: Tue, 7 Apr 2020 22:03:13 -0700 Message-Id: <20200408050323.4237-17-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 bulkscore=0 mlxlogscore=999 mlxscore=0 phishscore=0 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 bulkscore=0 adultscore=0 impostorscore=0 phishscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add paravirt_patch_runtime() which uses text_poke_late() to patch paravirt sites. Also add paravirt_worker() which does the actual insn generation generate_paravirt() (which uses runtime_patch() to generate the appropriate native or paravirt insn sequences) and then calls text_poke_site() to do the actual poking. CPU0 CPUx ---- ---- patch_worker() patch_worker() /* Traversal, insn-gen */ text_poke_sync_finish() tps.patch_worker() /* = paravirt_worker() */ /* * wait until: /* for each patch-site */ * tps->state == PATCH_DONE generate_paravirt() */ runtime_patch() text_poke_site() poke_sync() ... ... smp_store_release(&tps->state, PATCH_DONE) Signed-off-by: Ankur Arora --- arch/x86/include/asm/text-patching.h | 2 + arch/x86/kernel/alternative.c | 98 +++++++++++++++++++++++++++- 2 files changed, 99 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index c4b2814f2f9d..e86709a8287e 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -21,6 +21,8 @@ static inline void apply_paravirt(struct paravirt_patch_site *start, #ifndef CONFIG_PARAVIRT_RUNTIME #define __parainstructions_runtime NULL #define __parainstructions_runtime_end NULL +#else +int paravirt_runtime_patch(void); #endif /* diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 452d4081eded..1c5acdc4f349 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -1463,7 +1463,9 @@ static void poke_sync(struct text_poke_state *tps, int state, int offset, /** * text_poke_site() - called on the primary to patch a single call site. * - * Returns after switching tps->state to PATCH_SYNC_DONE. + * Called in thread context with tps->state == PATCH_SYNC_DONE where it + * takes tps->state through different PATCH_SYNC_* states, returning + * after having switched the tps->state back to PATCH_SYNC_DONE. */ static void __maybe_unused text_poke_site(struct text_poke_state *tps, struct text_poke_loc *tp) @@ -1598,6 +1600,16 @@ static int __maybe_unused text_poke_late(patch_worker_t worker, void *stage) return ret; } +/* + * Check if this address is still in scope of this module's .text section. + */ +static bool __maybe_unused stale_address(struct alt_module *am, u8 *p) +{ + if (p < am->text || p >= am->text_end) + return true; + return false; +} + #ifdef CONFIG_PARAVIRT_RUNTIME struct paravirt_stage_entry { void *dest; /* pv_op destination */ @@ -1654,4 +1666,88 @@ void text_poke_pv_stage_zero(void) lockdep_assert_held(&text_mutex); pv_stage.count = 0; } + +/** + * generate_paravirt - fill up the insn sequence for a pv-op. + * + * @tp - address of struct text_poke_loc + * @op - the pv-op entry for this location + * @site - patch site (kernel or module text) + */ +static void generate_paravirt(struct text_poke_loc *tp, + struct paravirt_stage_entry *op, + struct paravirt_patch_site *site) +{ + unsigned int used; + + BUG_ON(site->len > POKE_MAX_OPCODE_SIZE); + + text_poke_loc_init(tp, site->instr, site->instr, site->len, NULL, true); + + /* + * Paravirt patches can patch calls (ex. mmu.tlb_flush), + * callee_saves(ex. queued_spin_unlock). + * + * runtime_patch() calls native_patch(), or paravirt_patch() + * based on the destination. + */ + used = runtime_patch(site->type, (void *)tp->text, op->dest, + (unsigned long)site->instr, site->len); + + /* No good way to recover. */ + BUG_ON(used < 0); + + /* Pad the rest with nops */ + add_nops((void *)tp->text + used, site->len - used); +} + +/** + * paravirt_worker - generate the paravirt patching + * insns and calls text_poke_site() to do the actual patching. + */ +static void paravirt_worker(struct text_poke_state *tps) +{ + struct paravirt_patch_site *site; + struct paravirt_stage *stage = tps->stage; + struct paravirt_stage_entry *op = &stage->ops[0]; + struct alt_module *am; + struct text_poke_loc tp; + int i; + + list_for_each_entry(am, tps->head, next) { + for (site = am->para; site < am->para_end; site++) { + if (stale_address(am, site->instr)) + continue; + + for (i = 0; i < stage->count; i++) { + if (op[i].type != site->type) + continue; + + generate_paravirt(&tp, &op[i], site); + + text_poke_site(tps, &tp); + } + } + } +} + +/** + * paravirt_runtime_patch() -- patch pv-ops, including paired ops. + * + * Called holding the text_mutex. + * + * Modify possibly multiple mutually-dependent pv-op callsites + * (ex. pv_lock_ops) using stop_machine(). + * + * Return: 0 on success, -errno on failure. + */ +int paravirt_runtime_patch(void) +{ + lockdep_assert_held(&text_mutex); + + if (!pv_stage.count) + return -EINVAL; + + return text_poke_late(paravirt_worker, &pv_stage); +} #endif /* CONFIG_PARAVIRT_RUNTIME */ From patchwork Wed Apr 8 05:03:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479351 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CFD87112C for ; Wed, 8 Apr 2020 05:06:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9797C20771 for ; Wed, 8 Apr 2020 05:06:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="cHnvTrtu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726146AbgDHFG2 (ORCPT ); Wed, 8 Apr 2020 01:06:28 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:38584 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726713AbgDHFFg (ORCPT ); Wed, 8 Apr 2020 01:05:36 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03854gSa191662; Wed, 8 Apr 2020 05:05:24 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=oFQu5c5XJMNWspifH0eo8iyuqi9GZvEBODJQANlcAEo=; b=cHnvTrtu81AYis8bhTT0uEby3pTuev6G236CP0PzsfSaOXZPYSInf2z4ytK/OtlP6l24 Lqi3soKd+woQHGRwBladzfjWwRG2mMJiBf5q0t5K8EPT8EPGhaBSBmky2zhwDcMKpVSP AAwuDpZW8mEghBQKJXTcrojf3l5DiOFTj6Z/hRPELctygYmKOxqDrr9GLWPsIYm094qi IOpH5F7Ex36yPu5jdMQNcVLsQ+ePyT41+S9Df0QQ6H1+9kX7uxdU3Fm5nlrsEVBGHMHj DDWUYplrA9rMgTvIwH9LSQkK1sa2aRpXCCa/GSuBz4lkk4MycLL5tMA60Ghjs06yJO/2 RQ== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 3091m0s0sw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:23 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03851W8H100632; Wed, 8 Apr 2020 05:05:23 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3020.oracle.com with ESMTP id 3091m2hv9g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:23 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03855MKx007459; Wed, 8 Apr 2020 05:05:22 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:22 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 17/26] x86/alternatives: Add patching logic in text_poke_site() Date: Tue, 7 Apr 2020 22:03:14 -0700 Message-Id: <20200408050323.4237-18-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 mlxscore=0 malwarescore=0 spamscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 phishscore=0 suspectscore=0 bulkscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 clxscore=1015 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add actual poking and pipeline sync logic in poke_sync(). This is called from text_poke_site()). The patching logic is similar to that in text_poke_bp_batch() where we patch the first byte with an INT3, which serves as a barrier, then patch the remaining bytes and then come back and fixup the first byte. The first and the last steps are single byte writes and are thus atomic, and the second step is protected because the INT3 serves as a barrier. Between each of these steps is a global pipeline sync which ensures that remote pipelines flush out any stale opcodes that they might have cached. This is driven from poke_sync() where the primary introduces a sync_core() on secondary CPUs for every PATCH_SYNC_* state change. The corresponding loop on the secondary executes in text_poke_sync_site(). Note that breakpoints are not handled yet. CPU0 CPUx ---- ---- patch_worker() patch_worker() /* Traversal, insn-gen */ text_poke_sync_finish() tps.patch_worker() /* wait until: /* = paravirt_worker() */ * tps->state == PATCH_DONE */ /* for each patch-site */ generate_paravirt() runtime_patch() text_poke_site() text_poke_sync_site() poke_sync() /* for each: __text_do_poke() * PATCH_SYNC_[012] */ sync_one() sync_one() ack() ack() wait_for_acks() ... ... smp_store_release(&tps->state, PATCH_DONE) Signed-off-by: Ankur Arora --- arch/x86/kernel/alternative.c | 103 +++++++++++++++++++++++++++++++--- 1 file changed, 95 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 1c5acdc4f349..7fdaae9edbf0 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -1441,27 +1441,57 @@ struct text_poke_state { static struct text_poke_state text_poke_state; +static void wait_for_acks(struct text_poke_state *tps) +{ + int cpu = smp_processor_id(); + + cpumask_set_cpu(cpu, &tps->sync_ack_map); + + /* Wait until all CPUs are known to have observed the state change. */ + while (cpumask_weight(&tps->sync_ack_map) < tps->num_acks) + cpu_relax(); +} + /** - * poke_sync() - transitions to the specified state. + * poke_sync() - carries out one poke-step for a single site and + * transitions to the specified state. + * Called with the target populated in poking_mm and poking_addr. * * @tps - struct text_poke_state * * @state - one of PATCH_SYNC_* states * @offset - offset to be patched * @insns - insns to write * @len - length of insn sequence + * + * Returns after all CPUs have observed the state change and called + * sync_core(). */ static void poke_sync(struct text_poke_state *tps, int state, int offset, const char *insns, int len) { + if (len) + __text_do_poke(offset, insns, len); /* - * STUB: no patching or synchronization, just go through the - * motions. + * Stores to tps.sync_ack_map are ordered with + * smp_load_acquire(tps->state) in text_poke_sync_site() + * so we can safely clear the cpumask. */ smp_store_release(&tps->state, state); + + cpumask_clear(&tps->sync_ack_map); + + /* + * Introduce a synchronizing instruction in local and remote insn + * streams. This flushes any stale cached uops from CPU pipelines. + */ + sync_one(); + + wait_for_acks(tps); } /** * text_poke_site() - called on the primary to patch a single call site. + * The interlocking sync work on the secondary is done in text_poke_sync_site(). * * Called in thread context with tps->state == PATCH_SYNC_DONE where it * takes tps->state through different PATCH_SYNC_* states, returning @@ -1514,6 +1544,43 @@ static void __maybe_unused text_poke_site(struct text_poke_state *tps, &prev_mm, ptep); } +/** + * text_poke_sync_site() -- called to synchronize the CPU pipeline + * on secondary CPUs for each patch site. + * + * Called in thread context with tps->state == PATCH_SYNC_0. + * + * Returns after having observed tps->state == PATCH_SYNC_DONE. + */ +static void text_poke_sync_site(struct text_poke_state *tps) +{ + int cpu = smp_processor_id(); + int prevstate = -1; + int acked; + + /* + * In thread context we arrive here expecting tps->state to move + * in-order from PATCH_SYNC_{0 -> 1 -> 2} -> PATCH_SYNC_DONE. + */ + do { + /* + * Wait until there's some work for us to do. + */ + smp_cond_load_acquire(&tps->state, + prevstate != VAL); + + prevstate = READ_ONCE(tps->state); + + if (prevstate < PATCH_SYNC_DONE) { + acked = cpumask_test_cpu(cpu, &tps->sync_ack_map); + + BUG_ON(acked); + sync_one(); + cpumask_set_cpu(cpu, &tps->sync_ack_map); + } + } while (prevstate < PATCH_SYNC_DONE); +} + /** * text_poke_sync_finish() -- called to synchronize the CPU pipeline * on secondary CPUs for all patch sites. @@ -1525,6 +1592,7 @@ static void text_poke_sync_finish(struct text_poke_state *tps) { while (true) { enum patch_state state; + int cpu = smp_processor_id(); state = READ_ONCE(tps->state); @@ -1535,11 +1603,24 @@ static void text_poke_sync_finish(struct text_poke_state *tps) if (state == PATCH_DONE) break; - /* - * Relax here while the primary makes up its mind on - * whether it is done or not. - */ - cpu_relax(); + if (state == PATCH_SYNC_DONE) { + /* + * Ack that we've seen the end of this iteration + * and then wait until everybody's ready to move + * to the next iteration or exit. + */ + cpumask_set_cpu(cpu, &tps->sync_ack_map); + smp_cond_load_acquire(&tps->state, + (state != VAL)); + } else if (state == PATCH_SYNC_0) { + /* + * PATCH_SYNC_1, PATCH_SYNC_2 are handled + * inside text_poke_sync_site(). + */ + text_poke_sync_site(tps); + } else { + BUG(); + } } } @@ -1549,6 +1630,12 @@ static int patch_worker(void *t) struct text_poke_state *tps = t; if (cpu == tps->primary_cpu) { + /* + * The init state is PATCH_SYNC_DONE. Wait until the + * secondaries have assembled before we start patching. + */ + wait_for_acks(tps); + /* * Generates insns and calls text_poke_site() to do the poking * and sync. From patchwork Wed Apr 8 05:03:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479347 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E7F8892C for ; Wed, 8 Apr 2020 05:06:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BDD802083E for ; Wed, 8 Apr 2020 05:06:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="PWi7i/Tl" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726757AbgDHFFk (ORCPT ); Wed, 8 Apr 2020 01:05:40 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:38616 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726730AbgDHFFh (ORCPT ); Wed, 8 Apr 2020 01:05:37 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853tXe191207; Wed, 8 Apr 2020 05:05:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=CcwUPFYBMcHTPmOlaOoTSoN9P8fiON3G468Rwi+IExw=; b=PWi7i/Tl/Y/4IqFE9IYynQEmMUAL6h7KzUdGeQ8aowN7A2vG8aYJ2y1RPUpSFuJ3Al9p YwS1TqaOr8c7YrFYOyShMVhWKNWUtudifHat6cRswuPKLXRwcDy1ydKrAXiPIjl6lopM RMjHL24gyXWl/nW8A2qcjwXVsP1PZ3/0r+Im3dLJe5hFqWgI0VDvD9B0ICCD6mvtFn3w T50v9VWj4ITUUPDhFU4ejXZXdJYGwsLCualChOYeup/fi8BrQuX5+Iys4L4ZzuqJoJTd s99H+9vRPTV9un6VQffzowPsPPsbizyuvOJDx6hBeNIF4jIAX6jEt3+aYp275E6ry4xO kA== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by aserp2120.oracle.com with ESMTP id 3091m0s0sx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:25 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03852gSW148301; Wed, 8 Apr 2020 05:05:24 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3030.oracle.com with ESMTP id 3091kgj7gq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:24 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03855OXP030617; Wed, 8 Apr 2020 05:05:24 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:23 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 18/26] x86/alternatives: Handle BP in non-emulated text poking Date: Tue, 7 Apr 2020 22:03:15 -0700 Message-Id: <20200408050323.4237-19-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 bulkscore=0 suspectscore=0 spamscore=0 malwarescore=0 adultscore=0 phishscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 phishscore=0 suspectscore=0 bulkscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 clxscore=1015 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Handle breakpoints if we hit an INT3 either by way of an NMI while patching a site in the NMI handling path, or if we are patching text in text_poke_site() (executes on the primary), or in the pipeline sync path in text_poke_sync_site() (executes on secondary CPUs.) (The last two are not expected to happen, but see below.) The handling on the primary CPU is to update the insn stream locally such that we can return to the primary patching loop but not force the secondary CPUs to execute sync_core(). From my reading of the Intel spec and the thread which laid down the INT3 approach: https//lore.kernel.org/lkml/4B4D02B8.5020801@zytor.com, skipping the sync_core() would mean that remote pipelines -- if they have relevant uops cached would not see the updated instruction and would continue to execute stale uops. This is safe because the primary eventually gets back to the patching loop in text_poke_site() and resumes the state-machine, re-writing some of the insn sequences just written in the BP handling and forcing the secondary CPUs to execute sync_core(). The handling on the secondary, is to call text_poke_sync_site() just as in thread-context, so it contains acking the patch states such that the primary can continue making forward progress. This can be called in a re-entrant fashion. Note that this does mean that we cannot handle any patches in text_poke_sync_site() itself since that would end up being called recursively in the BP handler. Control flow diagram with the BP handler: CPU0-BP CPUx-BP ------- ------- poke_int3_native() poke_int3_native() __text_do_poke() text_poke_sync_site() sync_one() /* for state in: * [PATCH_SYNC_y.._SYNC_DONE) */ sync_one() ack() CPU0 CPUx ---- ---- patch_worker() patch_worker() /* Traversal, insn-gen */ text_poke_sync_finish() tps.patch_worker() /* wait until: /* = paravirt_worker() */ * tps->state == PATCH_DONE */ /* for each patch-site */ generate_paravirt() runtime_patch() text_poke_site() text_poke_sync_site() poke_sync() /* for state in: __text_do_poke() * [PATCH_SYNC_0..PATCH_SYNC_y] sync_one() */ ack() sync_one() wait_for_acks() ack() ... ... smp_store_release(&tps->state, PATCH_DONE) Signed-off-by: Ankur Arora --- arch/x86/kernel/alternative.c | 145 ++++++++++++++++++++++++++++++++-- 1 file changed, 137 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 7fdaae9edbf0..c68d940356a2 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -1055,6 +1055,8 @@ static int notrace patch_cmp(const void *key, const void *elt) } NOKPROBE_SYMBOL(patch_cmp); +static void poke_int3_native(struct pt_regs *regs, + struct text_poke_loc *tp); int notrace poke_int3_handler(struct pt_regs *regs) { struct bp_patching_desc *desc; @@ -1099,8 +1101,11 @@ int notrace poke_int3_handler(struct pt_regs *regs) goto out_put; } - if (desc->native) - BUG(); + if (desc->native) { + poke_int3_native(regs, tp); + ret = 1; /* handled */ + goto out_put; + } len = text_opcode_size(tp->emulated.opcode); ip += len; @@ -1469,8 +1474,15 @@ static void wait_for_acks(struct text_poke_state *tps) static void poke_sync(struct text_poke_state *tps, int state, int offset, const char *insns, int len) { - if (len) + if (len) { + /* + * Note that we could hit a BP right after patching memory + * below. This could happen before the state change further + * down. The primary BP handler allows us to make + * forward-progress in that case. + */ __text_do_poke(offset, insns, len); + } /* * Stores to tps.sync_ack_map are ordered with * smp_load_acquire(tps->state) in text_poke_sync_site() @@ -1504,11 +1516,22 @@ static void __maybe_unused text_poke_site(struct text_poke_state *tps, temp_mm_state_t prev_mm; pte_t *ptep; int offset; + struct bp_patching_desc desc = { + .vec = tp, + .nr_entries = 1, + .native = true, + .refs = ATOMIC_INIT(1), + }; __text_poke_map(text_poke_addr(tp), tp->native.len, &prev_mm, &ptep); offset = offset_in_page(text_poke_addr(tp)); + /* + * For INT3 use the same exclusion logic as BP emulation path. + */ + smp_store_release(&bp_desc, &desc); /* rcu_assign_pointer */ + /* * All secondary CPUs are waiting in tps->state == PATCH_SYNC_DONE * to move to PATCH_SYNC_0. Poke the INT3 and wait until all CPUs @@ -1537,6 +1560,19 @@ static void __maybe_unused text_poke_site(struct text_poke_state *tps, */ poke_sync(tps, PATCH_SYNC_DONE, 0, NULL, 0); + /* + * All CPUs have ack'd PATCH_SYNC_DONE. So there can be no + * laggard CPUs executing BP handlers. Reset bp_desc. + */ + WRITE_ONCE(bp_desc, NULL); /* RCU_INIT_POINTER */ + + /* + * We've already done the synchronization so this should not + * race. + */ + if (!atomic_dec_and_test(&desc.refs)) + atomic_cond_read_acquire(&desc.refs, !VAL); + /* * Unmap the poking_addr, poking_mm. */ @@ -1548,7 +1584,8 @@ static void __maybe_unused text_poke_site(struct text_poke_state *tps, * text_poke_sync_site() -- called to synchronize the CPU pipeline * on secondary CPUs for each patch site. * - * Called in thread context with tps->state == PATCH_SYNC_0. + * Called in thread context with tps->state == PATCH_SYNC_0 and in + * BP context with tps->state < PATCH_SYNC_DONE. * * Returns after having observed tps->state == PATCH_SYNC_DONE. */ @@ -1561,6 +1598,26 @@ static void text_poke_sync_site(struct text_poke_state *tps) /* * In thread context we arrive here expecting tps->state to move * in-order from PATCH_SYNC_{0 -> 1 -> 2} -> PATCH_SYNC_DONE. + * + * We could also arrive here in BP-context some point after having + * observed bp_patching.nr_entries (and after poking the first INT3.) + * This could happen by way of an NMI while we are patching a site + * that'll get executed in the NMI handler, or if we hit a site + * being patched in text_poke_sync_site(). + * + * Just as thread-context, the BP handler calls text_poke_sync_site() + * to keep the primary's state-machine moving forward until it has + * finished patching the call-site. At that point it is safe to + * unwind the contexts. + * + * The second case, where we are patching a site in + * text_poke_sync_site(), could end up in recursive BP handlers + * and is not handled. + * + * Note that unlike thread-context where the start state can only + * be PATCH_SYNC_0, in the BP-context, the start state could be any + * PATCH_SYNC_x, so long as (state < PATCH_SYNC_DONE) since once a + * CPU has acked PATCH_SYNC_2, there is no INT3 left for it to observe. */ do { /* @@ -1571,16 +1628,88 @@ static void text_poke_sync_site(struct text_poke_state *tps) prevstate = READ_ONCE(tps->state); - if (prevstate < PATCH_SYNC_DONE) { - acked = cpumask_test_cpu(cpu, &tps->sync_ack_map); - - BUG_ON(acked); + /* + * As described above, text_poke_sync_site() gets called + * from both thread-context and potentially in a re-entrant + * fashion in BP-context. Accordingly expect to potentially + * enter and exit this loop twice. + * + * Concretely, this means we need to handle the case where we + * see an already acked state at BP/NMI entry and, see a + * state discontinuity when returning to thread-context from + * BP-context which would return after having observed + * tps->state == PATCH_SYNC_DONE. + * + * Help this along by always exiting with tps->state == + * PATCH_SYNC_DONE but without acking it. Not acking it in + * text_poke_sync_site(), guarantees that the state can only + * forward once all secondary CPUs have exited both thread + * and BP-contexts. + */ + acked = cpumask_test_cpu(cpu, &tps->sync_ack_map); + if (prevstate < PATCH_SYNC_DONE && !acked) { sync_one(); cpumask_set_cpu(cpu, &tps->sync_ack_map); } } while (prevstate < PATCH_SYNC_DONE); } +static void poke_int3_native(struct pt_regs *regs, + struct text_poke_loc *tp) +{ + int cpu = smp_processor_id(); + struct text_poke_state *tps = &text_poke_state; + + if (cpu != tps->primary_cpu) { + /* + * We came here from the sync loop in text_poke_sync_site(). + * Continue syncing. The primary is waiting. + */ + text_poke_sync_site(tps); + } else { + int offset = offset_in_page(text_poke_addr(tp)); + + /* + * We are in the primary context and have hit the INT3 barrier + * either ourselves or via an NMI. + * + * The secondary CPUs at this time are either in the original + * text_poke_sync_site() loop or after having hit an NMI->INT3 + * themselves in the BP text_poke_sync_site() loop. + * + * The minimum that we need to do here is to update the local + * insn stream such that we can return to the primary loop. + * Without executing sync_core() on the secondary CPUs it is + * possible that some of them might be executing stale uops in + * their respective pipelines. + * + * This should be safe because we will get back to the patching + * loop in text_poke_site() in due course and will resume + * the state-machine where we left off including by re-writing + * some of the insns sequences just written here. + * + * Note that we continue to be in poking_mm context and so can + * safely call __text_do_poke() here. + */ + __text_do_poke(offset + INT3_INSN_SIZE, + tp->text + INT3_INSN_SIZE, + tp->native.len - INT3_INSN_SIZE); + __text_do_poke(offset, tp->text, INT3_INSN_SIZE); + + /* + * We only introduce a serializing instruction locally. As + * noted above, the secondary CPUs can stay where they are -- + * potentially executing in the now stale INT3.) This is fine + * because the primary will force the sync_core() on the + * secondary CPUs once it returns. + */ + sync_one(); + } + + /* A new start */ + regs->ip -= INT3_INSN_SIZE; +} + /** * text_poke_sync_finish() -- called to synchronize the CPU pipeline * on secondary CPUs for all patch sites. From patchwork Wed Apr 8 05:03:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479343 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B2CE912 for ; Wed, 8 Apr 2020 05:06:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 48CE02083E for ; Wed, 8 Apr 2020 05:06:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Nz0B0/o9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726777AbgDHFFq (ORCPT ); Wed, 8 Apr 2020 01:05:46 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:52764 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726734AbgDHFFo (ORCPT ); Wed, 8 Apr 2020 01:05:44 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03854HmH012943; Wed, 8 Apr 2020 05:05:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=3KrJyMdl6FuU/9RrwZPu8LutFkDCWnKoiuaiLyLQY8k=; b=Nz0B0/o9yL4G/oeYKXgml5AlpwAqrxs9OSYyfd/2t3KASsuXVslfpk3aV4siEspeImVv hF5mZp00cKk33qyWWIWzgx7Z/2TQbCIg4+L+xdwnRQHABYbAaX8eoRfCCmAuRRNxd8Yw b9h2KmNAKrdNawOdsrEQ9YaTpzcLbV4+S29wLj8mtcKjr+8OuLKxgVlpROe8Uc7ntEYy plZiZePsltN5gQ4IzVpPOcYcmZwK2LMz2g5CfWO4pPFGbQEi7yhGvMUP7T5/BxCphx9t ZQkM0P06aEe4dKlyv9maop2wZsC0FwJjCcQC7CvJv65TbINE/KWXveQRy4d4cNgM292g JA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 3091m390xw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:27 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03851WbQ100696; Wed, 8 Apr 2020 05:05:26 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 3091m2hvdm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:26 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03855PX9030620; Wed, 8 Apr 2020 05:05:25 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:25 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 19/26] x86/alternatives: NMI safe runtime patching Date: Tue, 7 Apr 2020 22:03:16 -0700 Message-Id: <20200408050323.4237-20-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 mlxscore=0 malwarescore=0 spamscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 adultscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 priorityscore=1501 clxscore=1015 bulkscore=0 phishscore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Runtime patching can deadlock with multiple simultaneous NMIs. This can happen while patching inter-dependent pv-ops which are used in the NMI path (ex pv_lock_ops): CPU0 CPUx ---- ---- patch_worker() patch_worker() /* Traversal, insn-gen */ text_poke_sync_finish() tps.patch_worker() /* wait until: /* = paravirt_worker() */ * tps->state == PATCH_DONE */ /* start-patching:lock.spin_unlock */ generate_paravirt() runtime_patch() text_poke_site() text_poke_sync_site() poke_sync() /* for state in: __text_do_poke() * PATCH_SYNC_[012] ==NMI== */ ==NMI== tries-to-acquire:nmi_lock acquires:nmi_lock tries-to-release:nmi_lock ==BP== text_poke_sync_site() /* waiting-for:nmi_lock */ /* waiting-for:patched-spin_unlock() */ A similar deadlock exists if two secondary CPUs get an NMI as well. Fix this by patching NMI-unsafe ops in an NMI context. Given that the NMI entry code ensures that NMIs do not nest, we are guaranteed that this can be done atomically. We do this by registering a local NMI handler (text_poke_nmi()) and triggering a local NMI on the primary (via patch_worker_nmi()) which then calls the same worker (tps->patch_worker()) as in thread-context. On the secondary, we continue with the pipeline sync loop (via text_poke_sync_finish()) in thread-context; however, if there is an NMI on the secondary, we call text_poke_sync_finish() in the handler which continues the work that was being done in thread-context. Also note that text_poke_nmi() always executes first so we know that it takes priority over any arbitrary code executing in the installed NMI handlers. CPU0 CPUx ---- ---- patch_worker(nmi=true) patch_worker(nmi=true) patch_worker_nmi() -> triggers NMI text_poke_sync_finish() /* wait for return from NMI */ /* wait until: ... * tps->state == PATCH_DONE */ smp_store_release(&tps->state, PATCH_DONE) /* for each patch-site */ text_poke_sync_site() CPU0-NMI /* for each: -------- * PATCH_SYNC_[012] */ text_poke_nmi() sync_one() /* Traversal, insn-gen */ ack() tps.patch_worker() /* = paravirt_worker() */ ... /* for each patch-site */ generate_paravirt() runtime_patch() text_poke_site() poke_sync() __text_do_poke() sync_one() ack() wait_for_acks() ... Signed-off-by: Ankur Arora --- arch/x86/include/asm/text-patching.h | 2 +- arch/x86/kernel/alternative.c | 120 ++++++++++++++++++++++++++- 2 files changed, 117 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index e86709a8287e..9ba329bf9479 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -22,7 +22,7 @@ static inline void apply_paravirt(struct paravirt_patch_site *start, #define __parainstructions_runtime NULL #define __parainstructions_runtime_end NULL #else -int paravirt_runtime_patch(void); +int paravirt_runtime_patch(bool nmi); #endif /* diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index c68d940356a2..385c3e6ea925 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -1442,6 +1442,14 @@ struct text_poke_state { unsigned int primary_cpu; /* CPU doing the patching. */ unsigned int num_acks; /* Number of Acks needed. */ + + /* + * To synchronize with the NMI handler. + */ + atomic_t nmi_work; + + /* Ensure this is patched atomically against NMIs. */ + bool nmi_context; }; static struct text_poke_state text_poke_state; @@ -1715,6 +1723,7 @@ static void poke_int3_native(struct pt_regs *regs, * on secondary CPUs for all patch sites. * * Called in thread context with tps->state == PATCH_SYNC_DONE. + * Also might be called from NMI context with an arbitrary tps->state. * Returns with tps->state == PATCH_DONE. */ static void text_poke_sync_finish(struct text_poke_state *tps) @@ -1741,6 +1750,12 @@ static void text_poke_sync_finish(struct text_poke_state *tps) cpumask_set_cpu(cpu, &tps->sync_ack_map); smp_cond_load_acquire(&tps->state, (state != VAL)); + } else if (in_nmi() && (state & PATCH_SYNC_x)) { + /* + * Called in case of NMI so we should be ready + * to be called with any PATCH_SYNC_x. + */ + text_poke_sync_site(tps); } else if (state == PATCH_SYNC_0) { /* * PATCH_SYNC_1, PATCH_SYNC_2 are handled @@ -1753,6 +1768,91 @@ static void text_poke_sync_finish(struct text_poke_state *tps) } } +/* + * text_poke_nmi() - primary CPU comes here (via self NMI) and the + * secondary (if there's an NMI.) + * + * By placing this NMI handler first, we can restrict execution of any + * NMI code that might be under patching. + * Local NMI handling also does not go through any locking code so it + * should be safe to install one. + * + * In both these roles the state-machine is identical to the one that + * we had in task context. + */ +static int text_poke_nmi(unsigned int val, struct pt_regs *regs) +{ + int ret, cpu = smp_processor_id(); + struct text_poke_state *tps = &text_poke_state; + + /* + * We came here because there's a text-poke handler + * installed. Get out if there's no work assigned yet. + */ + if (atomic_read(&tps->nmi_work) == 0) + return NMI_DONE; + + if (cpu == tps->primary_cpu) { + /* + * Do what we came here for. We can safely patch: any + * secondary CPUs executing in NMI context have been + * captured in the code below and are doing useful + * work. + */ + tps->patch_worker(tps); + + /* + * Both the primary and the secondary CPUs are done (in NMI + * or thread context.) Mark work done so any future NMIs can + * skip this and go to the real handler. + */ + atomic_dec(&tps->nmi_work); + + /* + * The NMI was self-induced, consume it. + */ + ret = NMI_HANDLED; + } else { + /* + * Unexpected NMI on a secondary CPU: do sync_core() + * work until done. + */ + text_poke_sync_finish(tps); + + /* + * The NMI was spontaneous, not self-induced. + * Don't consume it. + */ + ret = NMI_DONE; + } + + return ret; +} + +/* + * patch_worker_nmi() - sets up an NMI handler to do the + * patching work. + * This stops any NMIs from interrupting any code that might + * be getting patched. + */ +static void __maybe_unused patch_worker_nmi(void) +{ + atomic_set(&text_poke_state.nmi_work, 1); + /* + * We could just use apic->send_IPI_self here. However, for reasons + * that I don't understand, apic->send_IPI() or apic->send_IPI_mask() + * work but apic->send_IPI_self (which internally does apic_write()) + * does not. + */ + apic->send_IPI(smp_processor_id(), NMI_VECTOR); + + /* + * Barrier to ensure that we do actually execute the NMI + * before exiting. + */ + atomic_cond_read_acquire(&text_poke_state.nmi_work, !VAL); +} + static int patch_worker(void *t) { int cpu = smp_processor_id(); @@ -1769,7 +1869,10 @@ static int patch_worker(void *t) * Generates insns and calls text_poke_site() to do the poking * and sync. */ - tps->patch_worker(tps); + if (!tps->nmi_context) + tps->patch_worker(tps); + else + patch_worker_nmi(); /* * We are done patching. Switch the state to PATCH_DONE @@ -1790,7 +1893,8 @@ static int patch_worker(void *t) * * Return: 0 on success, -errno on failure. */ -static int __maybe_unused text_poke_late(patch_worker_t worker, void *stage) +static int __maybe_unused text_poke_late(patch_worker_t worker, void *stage, + bool nmi) { int ret; @@ -1807,12 +1911,20 @@ static int __maybe_unused text_poke_late(patch_worker_t worker, void *stage) text_poke_state.state = PATCH_SYNC_DONE; /* Start state */ text_poke_state.primary_cpu = smp_processor_id(); + text_poke_state.nmi_context = nmi; + + if (nmi) + register_nmi_handler(NMI_LOCAL, text_poke_nmi, + NMI_FLAG_FIRST, "text_poke_nmi"); /* * Run the worker on all online CPUs. Don't need to do anything * for offline CPUs as they come back online with a clean cache. */ ret = stop_machine(patch_worker, &text_poke_state, cpu_online_mask); + if (nmi) + unregister_nmi_handler(NMI_LOCAL, "text_poke_nmi"); + return ret; } @@ -1957,13 +2069,13 @@ static void paravirt_worker(struct text_poke_state *tps) * * Return: 0 on success, -errno on failure. */ -int paravirt_runtime_patch(void) +int paravirt_runtime_patch(bool nmi) { lockdep_assert_held(&text_mutex); if (!pv_stage.count) return -EINVAL; - return text_poke_late(paravirt_worker, &pv_stage); + return text_poke_late(paravirt_worker, &pv_stage, nmi); } #endif /* CONFIG_PARAVIRT_RUNTIME */ From patchwork Wed Apr 8 05:03:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479341 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E399792C for ; Wed, 8 Apr 2020 05:06:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B820E2082F for ; Wed, 8 Apr 2020 05:06:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="vt9sy4IT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726924AbgDHFGM (ORCPT ); Wed, 8 Apr 2020 01:06:12 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:52844 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726830AbgDHFFt (ORCPT ); Wed, 8 Apr 2020 01:05:49 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 038553d4013457; Wed, 8 Apr 2020 05:05:28 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=RT2DLUiFIblGAhj2vsdXWMs47UzEg77Wx4rwQqNvIys=; b=vt9sy4IT0ona61EF6lrQB6+fOd4n1UUP16sabHo45o2Fj3qC6LzIzwKRjy/UGHKp6UtX qFSd5YSVzautGXeCr2eAJwfIK/iGXCJNeAxcoDO3I7m5gSP91p4n0pb5kTTgdsi0yjOx qo22FkosRIApaeM6Fz6ilxR/dtoXxhOalPa8lPluAeNfJBYl4kG5bLmjOE/87zZNyJBJ gyT8RCVnfUfcfVutEDgu5tGajHjyK4ZwDhUOdnggz0MB/ArZELsxKKhJSr8EvqA/iPGM LytegfsDsU5SPuZ1H7lB9GSPk4DwMfVvQ5OLU8BwvFvA+C8d07oHuaLcCOsQnaN6BFRD jw== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 3091m390y1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:28 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853M4I159232; Wed, 8 Apr 2020 05:05:28 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 3091m01g4f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:27 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03855QOv007470; Wed, 8 Apr 2020 05:05:26 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:26 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 20/26] x86/paravirt: Enable pv-spinlocks in runtime_patch() Date: Tue, 7 Apr 2020 22:03:17 -0700 Message-Id: <20200408050323.4237-21-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 malwarescore=0 mlxlogscore=999 phishscore=0 spamscore=0 adultscore=0 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 adultscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 priorityscore=1501 clxscore=1015 bulkscore=0 phishscore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Enable runtime patching of paravirt spinlocks. These can be trivially enabled because pv_lock_ops are never preemptible -- preemption is disabled at entry to spin_lock*(). Note that a particular CPU instance might get preempted in the host but because runtime_patching() is called via stop_machine(), the migration thread would flush out any kernel threads preempted in the host. Signed-off-by: Ankur Arora --- arch/x86/include/asm/paravirt.h | 10 +++++----- arch/x86/kernel/paravirt_patch.c | 12 ++++++++++++ kernel/locking/lock_events.c | 2 +- 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 694d8daf4983..cb3d0a91c060 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -642,27 +642,27 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, static __always_inline void pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) { - PVOP_VCALL2(lock.queued_spin_lock_slowpath, lock, val); + PVRTOP_VCALL2(lock.queued_spin_lock_slowpath, lock, val); } static __always_inline void pv_queued_spin_unlock(struct qspinlock *lock) { - PVOP_VCALLEE1(lock.queued_spin_unlock, lock); + PVRTOP_VCALLEE1(lock.queued_spin_unlock, lock); } static __always_inline void pv_wait(u8 *ptr, u8 val) { - PVOP_VCALL2(lock.wait, ptr, val); + PVRTOP_VCALL2(lock.wait, ptr, val); } static __always_inline void pv_kick(int cpu) { - PVOP_VCALL1(lock.kick, cpu); + PVRTOP_VCALL1(lock.kick, cpu); } static __always_inline bool pv_vcpu_is_preempted(long cpu) { - return PVOP_CALLEE1(bool, lock.vcpu_is_preempted, cpu); + return PVRTOP_CALLEE1(bool, lock.vcpu_is_preempted, cpu); } void __raw_callee_save___native_queued_spin_unlock(struct qspinlock *lock); diff --git a/arch/x86/kernel/paravirt_patch.c b/arch/x86/kernel/paravirt_patch.c index 3eb8c0e720b4..3f8606f2811c 100644 --- a/arch/x86/kernel/paravirt_patch.c +++ b/arch/x86/kernel/paravirt_patch.c @@ -152,6 +152,18 @@ int runtime_patch(u8 type, void *insn_buff, void *op, /* Nothing whitelisted for now. */ switch (type) { +#ifdef CONFIG_PARAVIRT_SPINLOCKS + /* + * Preemption is always disabled in the lifetime of a spinlock + * (whether held or while waiting to acquire.) + */ + case PARAVIRT_PATCH(lock.queued_spin_lock_slowpath): + case PARAVIRT_PATCH(lock.queued_spin_unlock): + case PARAVIRT_PATCH(lock.wait): + case PARAVIRT_PATCH(lock.kick): + case PARAVIRT_PATCH(lock.vcpu_is_preempted): + break; +#endif default: pr_warn("type=%d unsuitable for runtime-patching\n", type); return -EINVAL; diff --git a/kernel/locking/lock_events.c b/kernel/locking/lock_events.c index fa2c2f951c6b..c3057e82e6f9 100644 --- a/kernel/locking/lock_events.c +++ b/kernel/locking/lock_events.c @@ -115,7 +115,7 @@ static const struct file_operations fops_lockevent = { .llseek = default_llseek, }; -#ifdef CONFIG_PARAVIRT_SPINLOCKS +#if defined(CONFIG_PARAVIRT_SPINLOCKS) && !defined(CONFIG_PARAVIRT_RUNTIME) #include static bool __init skip_lockevent(const char *name) From patchwork Wed Apr 8 05:03:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479335 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DA9E2912 for ; Wed, 8 Apr 2020 05:06:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A33DD2083E for ; Wed, 8 Apr 2020 05:06:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="AvHVQP9q" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726878AbgDHFF5 (ORCPT ); Wed, 8 Apr 2020 01:05:57 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:38322 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726846AbgDHFFx (ORCPT ); Wed, 8 Apr 2020 01:05:53 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03854CmS180113; Wed, 8 Apr 2020 05:05:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=7QyLiBCO3vyvqJ8QQH0U81XYGVivm5nwHGLYsFlNJAo=; b=AvHVQP9qtew1hsN6EtCaOeF520A4+Z8TUiZF8HI5faa0Jz1VTdsvMiJby5SU8RUz1GIi Rg7/4ShQivVruZUqr7xq5Zo5D65mgdKB0GzTlFD1YqCjZqnXWqZVYy6AbPPQA8DcQ9w6 nb8znQ4P7Cgfx/4eWvv5Co9yR3rhKguPsyGKTvdcq+RXTH7hGYWKx4rlfUfNxTY2kMyp jH65pvII+gGiUkKYMJUofuBTG0cihFr4kMmZSaPbLnay6cPDpOUebP8eRjiWZ18zM8Do sJDeBhv3oYGnjqdviRm8ykKGhEvBOZE88dP/yx/gC9JPyQDY4ir2oY0TfZnbZUoc8nA/ Yw== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 3091mnh150-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:30 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03852X3K062284; Wed, 8 Apr 2020 05:05:29 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3020.oracle.com with ESMTP id 3091mh1kvb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:29 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03855SHY007473; Wed, 8 Apr 2020 05:05:28 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:27 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 21/26] x86/alternatives: Paravirt runtime selftest Date: Tue, 7 Apr 2020 22:03:18 -0700 Message-Id: <20200408050323.4237-22-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 bulkscore=0 mlxlogscore=999 mlxscore=0 phishscore=0 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 bulkscore=0 adultscore=0 impostorscore=0 phishscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add a selftest that triggers paravirt_runtime_patch() which toggles between the paravirt and native pv_lock_ops. The selftest also register an NMI handler, which exercises the patched pv-ops by spin-lock operations. These are triggered via artificially sent NMIs. And last, introduce patch sites in the primary and secondary patching code which are hit while during the patching process. Signed-off-by: Ankur Arora --- arch/x86/Kconfig.debug | 13 ++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/alternative.c | 20 +++ arch/x86/kernel/kvm.c | 4 +- arch/x86/kernel/pv_selftest.c | 264 ++++++++++++++++++++++++++++++++++ arch/x86/kernel/pv_selftest.h | 15 ++ 6 files changed, 315 insertions(+), 2 deletions(-) create mode 100644 arch/x86/kernel/pv_selftest.c create mode 100644 arch/x86/kernel/pv_selftest.h diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index 2e74690b028a..82a8e3fa68c7 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -252,6 +252,19 @@ config X86_DEBUG_FPU If unsure, say N. +config DEBUG_PARAVIRT_SELFTEST + bool "Enable paravirt runtime selftest" + depends on PARAVIRT + depends on PARAVIRT_RUNTIME + depends on PARAVIRT_SPINLOCKS + depends on KVM_GUEST + help + This option enables sanity testing of the runtime paravirtualized + patching code. Triggered via debugfs. + + Might help diagnose patching problems in different + configurations and loads. + config PUNIT_ATOM_DEBUG tristate "ATOM Punit debug driver" depends on PCI diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index ba89cabe5fcf..ed3c93681f12 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -114,6 +114,7 @@ obj-$(CONFIG_APB_TIMER) += apb_timer.o obj-$(CONFIG_AMD_NB) += amd_nb.o obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o +obj-$(CONFIG_DEBUG_PARAVIRT_SELFTEST) += pv_selftest.o obj-$(CONFIG_KVM_GUEST) += kvm.o kvmclock.o obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch.o diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 385c3e6ea925..26407d7a54db 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -26,6 +26,7 @@ #include #include #include +#include "pv_selftest.h" int __read_mostly alternatives_patched; @@ -1549,6 +1550,12 @@ static void __maybe_unused text_poke_site(struct text_poke_state *tps, */ poke_sync(tps, PATCH_SYNC_0, offset, &int3, INT3_INSN_SIZE); + /* + * We have an INT3 in place; execute a contrived selftest that + * has an insn sequence that is under patching. + */ + pv_selftest_primary(); + /* Poke remaining */ poke_sync(tps, PATCH_SYNC_1, offset + INT3_INSN_SIZE, tp->text + INT3_INSN_SIZE, tp->native.len - INT3_INSN_SIZE); @@ -1634,6 +1641,19 @@ static void text_poke_sync_site(struct text_poke_state *tps) smp_cond_load_acquire(&tps->state, prevstate != VAL); + /* + * Send an NMI to one of the other CPUs. + */ + pv_selftest_send_nmi(); + + /* + * We have an INT3 in place; execute a contrived selftest that + * has an insn sequence that is under patching. + * + * Note that this function is also called from BP fixup but + * is just an NOP when called from there. + */ + pv_selftest_secondary(); prevstate = READ_ONCE(tps->state); /* diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 6efe0410fb72..e56d263159d7 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -779,7 +779,7 @@ arch_initcall(kvm_alloc_cpumask); #ifdef CONFIG_PARAVIRT_SPINLOCKS /* Kick a cpu by its apicid. Used to wake up a halted vcpu */ -static void kvm_kick_cpu(int cpu) +void kvm_kick_cpu(int cpu) { int apicid; unsigned long flags = 0; @@ -790,7 +790,7 @@ static void kvm_kick_cpu(int cpu) #include -static void kvm_wait(u8 *ptr, u8 val) +void kvm_wait(u8 *ptr, u8 val) { unsigned long flags; diff --git a/arch/x86/kernel/pv_selftest.c b/arch/x86/kernel/pv_selftest.c new file mode 100644 index 000000000000..e522f444bd6e --- /dev/null +++ b/arch/x86/kernel/pv_selftest.c @@ -0,0 +1,264 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "pv_selftest.h" + +static int nmi_selftest; +static bool cond_state; + +#define SELFTEST_PARAVIRT 1 +static int test_mode; + +/* + * Mark this and the following functions __always_inline to ensure + * we generate multiple patch sites that can be hit independently + * in thread, NMI etc contexts. + */ +static __always_inline void selftest_pv(void) +{ + struct qspinlock test; + + memset(&test, 0, sizeof(test)); + + test.locked = _Q_LOCKED_VAL; + + /* + * Sits directly in the path of the test. + * + * The primary sets up an INT3 instruction at pv_queued_spin_unlock(). + * Both the primary and secondary CPUs should hit that in both + * thread and NMI contexts. + * + * Additionally, this also gets inlined in nmi_pv_callback() so we + * should hit this with nmi_selftest. + * + * The fixup takes place in poke_int3_native(). + */ + pv_queued_spin_unlock(&test); +} + +static __always_inline void patch_selftest(void) +{ + if (test_mode == SELFTEST_PARAVIRT) + selftest_pv(); +} + +static DEFINE_PER_CPU(int, selftest_count); +void pv_selftest_secondary(void) +{ + /* + * On the secondary we execute the same code in both the + * thread-context and the BP-context and so would hit this + * recursively if we do inside the fixup context. + * + * So we trigger the selftest only if it's not ongoing already + * (thus allowing the thread or NMI context, but excluding + * the INT3 handling path.) + */ + if (this_cpu_read(selftest_count)) + return; + + this_cpu_inc(selftest_count); + + patch_selftest(); + + this_cpu_dec(selftest_count); +} + +void pv_selftest_primary(void) +{ + patch_selftest(); +} + +/* + * We only come here if nmi_selftest > 0. + * - nmi_selftest >= 1: execute a pv-op that will be patched + * - nmi_selftest >= 2: execute a paired pv-op that is also contended + * - nmi_selftest >= 3: add lock contention + */ +static int nmi_callback(unsigned int val, struct pt_regs *regs) +{ + static DEFINE_SPINLOCK(nmi_spin); + + if (!nmi_selftest) + goto out; + + patch_selftest(); + + if (nmi_selftest >= 2) { + /* + * Depending on whether CONFIG_[UN]INLINE_SPIN_* are + * defined or not, these would get patched or just + * create race conditions between via NMIs. + */ + spin_lock(&nmi_spin); + + /* Dilate the critical section to force contention. */ + if (nmi_selftest >= 3) + udelay(1); + + spin_unlock(&nmi_spin); + } + + /* + * nmi_selftest > 0, but we should really have a bitmap where + * to check if this really was destined for us or not. + */ + return NMI_HANDLED; +out: + return NMI_DONE; +} + +void pv_selftest_register(void) +{ + register_nmi_handler(NMI_LOCAL, nmi_callback, + 0, "paravirt_nmi_selftest"); +} + +void pv_selftest_unregister(void) +{ + unregister_nmi_handler(NMI_LOCAL, "paravirt_nmi_selftest"); +} + +void pv_selftest_send_nmi(void) +{ + int cpu = smp_processor_id(); + /* NMI or INT3 */ + if (nmi_selftest && !in_interrupt()) + apic->send_IPI(cpu + 1 % num_online_cpus(), NMI_VECTOR); +} + +/* + * Just declare these locally here instead of having them be + * exposed to the whole world. + */ +void kvm_wait(u8 *ptr, u8 val); +void kvm_kick_cpu(int cpu); +bool __raw_callee_save___kvm_vcpu_is_preempted(long cpu); +static void pv_spinlocks(void) +{ + paravirt_stage_alt(cond_state, + lock.queued_spin_lock_slowpath, + __pv_queued_spin_lock_slowpath); + paravirt_stage_alt(cond_state, lock.queued_spin_unlock.func, + PV_CALLEE_SAVE(__pv_queued_spin_unlock).func); + paravirt_stage_alt(cond_state, lock.wait, kvm_wait); + paravirt_stage_alt(cond_state, lock.kick, kvm_kick_cpu); + + paravirt_stage_alt(cond_state, + lock.vcpu_is_preempted.func, + PV_CALLEE_SAVE(__kvm_vcpu_is_preempted).func); +} + +void pv_trigger(void) +{ + bool nmi_mode = nmi_selftest ? true : false; + int ret; + + pr_debug("%s: nmi=%d; NMI-mode=%d\n", __func__, nmi_selftest, nmi_mode); + + mutex_lock(&text_mutex); + + paravirt_stage_zero(); + pv_spinlocks(); + + /* + * paravirt patching for pv_locks can potentially deadlock + * if we are running with nmi_mode=false and we get an NMI. + * + * For the sake of testing that path, we risk it. However, if + * we are generating synthetic NMIs (nmi_selftest > 0) then + * run with nmi_mode=true. + */ + ret = paravirt_runtime_patch(nmi_mode); + + /* + * Flip the state so we switch the pv_lock_ops on the next test. + */ + cond_state = !cond_state; + + mutex_unlock(&text_mutex); + + pr_debug("%s: nmi=%d; NMI-mode=%d, ret=%d\n", __func__, nmi_selftest, + nmi_mode, ret); +} + +static void pv_selftest_trigger(void) +{ + test_mode = SELFTEST_PARAVIRT; + pv_trigger(); +} + +static ssize_t pv_selftest_write(struct file *file, const char __user *ubuf, + size_t count, loff_t *ppos) +{ + pv_selftest_register(); + pv_selftest_trigger(); + pv_selftest_unregister(); + + return count; +} + +static ssize_t pv_nmi_read(struct file *file, char __user *ubuf, + size_t count, loff_t *ppos) +{ + char buf[32]; + unsigned int len; + + len = snprintf(buf, sizeof(buf), "%d\n", nmi_selftest); + return simple_read_from_buffer(ubuf, count, ppos, buf, len); +} + +static ssize_t pv_nmi_write(struct file *file, const char __user *ubuf, + size_t count, loff_t *ppos) +{ + char buf[32]; + unsigned int len; + unsigned int enabled; + + len = min(sizeof(buf) - 1, count); + if (copy_from_user(buf, ubuf, len)) + return -EFAULT; + + buf[len] = '\0'; + if (kstrtoint(buf, 0, &enabled)) + return -EINVAL; + + nmi_selftest = enabled > 3 ? 3 : enabled; + + return count; +} + +static const struct file_operations pv_selftest_fops = { + .read = NULL, + .write = pv_selftest_write, + .llseek = default_llseek, +}; + +static const struct file_operations pv_nmi_fops = { + .read = pv_nmi_read, + .write = pv_nmi_write, + .llseek = default_llseek, +}; + +static int __init pv_selftest_init(void) +{ + struct dentry *d = debugfs_create_dir("pv_selftest", NULL); + + debugfs_create_file("toggle", 0600, d, NULL, &pv_selftest_fops); + debugfs_create_file("nmi", 0600, d, NULL, &pv_nmi_fops); + + return 0; +} + +late_initcall(pv_selftest_init); diff --git a/arch/x86/kernel/pv_selftest.h b/arch/x86/kernel/pv_selftest.h new file mode 100644 index 000000000000..5afa0f7db5cc --- /dev/null +++ b/arch/x86/kernel/pv_selftest.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _PVR_SELFTEST_H +#define _PVR_SELFTEST_H + +#ifdef CONFIG_DEBUG_PARAVIRT_SELFTEST +void pv_selftest_send_nmi(void); +void pv_selftest_primary(void); +void pv_selftest_secondary(void); +#else +static inline void pv_selftest_send_nmi(void) { } +static inline void pv_selftest_primary(void) { } +static inline void pv_selftest_secondary(void) { } +#endif /*! CONFIG_DEBUG_PARAVIRT_SELFTEST */ + +#endif /* _PVR_SELFTEST_H */ From patchwork Wed Apr 8 05:03:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479423 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3283892C for ; Wed, 8 Apr 2020 05:07:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 05C0220771 for ; Wed, 8 Apr 2020 05:07:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="HAM6OV/l" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727593AbgDHFHr (ORCPT ); Wed, 8 Apr 2020 01:07:47 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:54356 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727354AbgDHFHq (ORCPT ); Wed, 8 Apr 2020 01:07:46 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 038542I1012889; Wed, 8 Apr 2020 05:07:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=BUklSOLzYwRiPoSEy5WnSrk9xDtG8YVnZFCF9aTkapM=; b=HAM6OV/lJ8vmGWaH8HEJdwRVp6ooCzYRoUgChfSW/OrSpqxiPml6ta3JoC90TxAKpbMs /Aw45BvATgx0Gcb944Wm1+NEmIyniu3IdIy9ErStnhlW3Bc0dWmZTJdB3sq08USIeKo/ aqkvcvZA2mZGOwjMug+1v+JOivODST/9JEJwD93eAR9vuYHZGeissmTBP4cMlxdEJan1 U/qEsbPmwI1VaIxk6TYSwbVGnWc9dH2xDJBKxKQYq17Er/B8vs95H2Zsew/xzl6XYBFB TbYIeOc2t3aZfJw6LgrJ1ZcG29Oaic0u0EqByPailVmBhykZpOmmRBElQQBF1OUqBn75 CQ== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 3091m3915p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:07:32 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03851Wqa100631; Wed, 8 Apr 2020 05:05:31 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 3091m2hvmg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:31 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03855TSJ022184; Wed, 8 Apr 2020 05:05:29 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:29 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 22/26] kvm/paravirt: Encapsulate KVM pv switching logic Date: Tue, 7 Apr 2020 22:03:19 -0700 Message-Id: <20200408050323.4237-23-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 mlxscore=0 malwarescore=0 spamscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 adultscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 priorityscore=1501 clxscore=1015 bulkscore=0 phishscore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org KVM pv-ops are dependent on KVM features/hints which are exposed via CPUID. Decouple the probing and the enabling of these ops from __init so they can be called post-init as well. Signed-off-by: Ankur Arora --- arch/x86/Kconfig | 1 + arch/x86/kernel/kvm.c | 87 ++++++++++++++++++++++++++++++------------- 2 files changed, 63 insertions(+), 25 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 605619938f08..e0629558b6b5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -809,6 +809,7 @@ config KVM_GUEST depends on PARAVIRT select PARAVIRT_CLOCK select ARCH_CPUIDLE_HALTPOLL + select PARAVIRT_RUNTIME default y ---help--- This option enables various optimizations for running under the KVM diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index e56d263159d7..31f5ecfd3907 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -262,12 +263,20 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned lon } NOKPROBE_SYMBOL(do_async_page_fault); +static bool kvm_pv_io_delay(void) +{ + bool cond = kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY); + + paravirt_stage_alt(cond, cpu.io_delay, kvm_io_delay); + + return cond; +} + static void __init paravirt_ops_setup(void) { pv_info.name = "KVM"; - if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY)) - pv_ops.cpu.io_delay = kvm_io_delay; + kvm_pv_io_delay(); #ifdef CONFIG_X86_IO_APIC no_timer_check = 1; @@ -432,6 +441,15 @@ static bool pv_tlb_flush_supported(void) kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)); } +static bool kvm_pv_steal_clock(void) +{ + bool cond = kvm_para_has_feature(KVM_FEATURE_STEAL_TIME); + + paravirt_stage_alt(cond, time.steal_clock, kvm_steal_clock); + + return cond; +} + static DEFINE_PER_CPU(cpumask_var_t, __pv_cpu_mask); #ifdef CONFIG_SMP @@ -624,6 +642,17 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask, native_flush_tlb_others(flushmask, info); } +static bool kvm_pv_tlb(void) +{ + bool cond = pv_tlb_flush_supported(); + + paravirt_stage_alt(cond, mmu.flush_tlb_others, + kvm_flush_tlb_others); + paravirt_stage_alt(cond, mmu.tlb_remove_table, + tlb_remove_table); + return cond; +} + static void __init kvm_guest_init(void) { int i; @@ -635,16 +664,11 @@ static void __init kvm_guest_init(void) if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF)) x86_init.irqs.trap_init = kvm_apf_trap_init; - if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) { + if (kvm_pv_steal_clock()) has_steal_clock = 1; - pv_ops.time.steal_clock = kvm_steal_clock; - } - if (pv_tlb_flush_supported()) { - pv_ops.mmu.flush_tlb_others = kvm_flush_tlb_others; - pv_ops.mmu.tlb_remove_table = tlb_remove_table; + if (kvm_pv_tlb()) pr_info("KVM setup pv remote TLB flush\n"); - } if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) apic_set_eoi_write(kvm_guest_apic_eoi_write); @@ -849,33 +873,46 @@ asm( #endif +static inline bool kvm_para_lock_ops(void) +{ + /* Does host kernel support KVM_FEATURE_PV_UNHALT? */ + return kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) && + !kvm_para_has_hint(KVM_HINTS_REALTIME); +} + +static bool kvm_pv_spinlock(void) +{ + bool cond = kvm_para_lock_ops(); + bool preempt_cond = cond && + kvm_para_has_feature(KVM_FEATURE_STEAL_TIME); + + paravirt_stage_alt(cond, lock.queued_spin_lock_slowpath, + __pv_queued_spin_lock_slowpath); + paravirt_stage_alt(cond, lock.queued_spin_unlock.func, + PV_CALLEE_SAVE(__pv_queued_spin_unlock).func); + paravirt_stage_alt(cond, lock.wait, kvm_wait); + paravirt_stage_alt(cond, lock.kick, kvm_kick_cpu); + + paravirt_stage_alt(preempt_cond, + lock.vcpu_is_preempted.func, + PV_CALLEE_SAVE(__kvm_vcpu_is_preempted).func); + return cond; +} + /* * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present. */ void __init kvm_spinlock_init(void) { - /* Does host kernel support KVM_FEATURE_PV_UNHALT? */ - if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) - return; - - if (kvm_para_has_hint(KVM_HINTS_REALTIME)) - return; /* Don't use the pvqspinlock code if there is only 1 vCPU. */ if (num_possible_cpus() == 1) return; - __pv_init_lock_hash(); - pv_ops.lock.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath; - pv_ops.lock.queued_spin_unlock = - PV_CALLEE_SAVE(__pv_queued_spin_unlock); - pv_ops.lock.wait = kvm_wait; - pv_ops.lock.kick = kvm_kick_cpu; + if (!kvm_pv_spinlock()) + return; - if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) { - pv_ops.lock.vcpu_is_preempted = - PV_CALLEE_SAVE(__kvm_vcpu_is_preempted); - } + __pv_init_lock_hash(); } #endif /* CONFIG_PARAVIRT_SPINLOCKS */ From patchwork Wed Apr 8 05:03:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479345 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7311C92C for ; Wed, 8 Apr 2020 05:06:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 513DE2082F for ; Wed, 8 Apr 2020 05:06:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="B6xflBgt" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726934AbgDHFGS (ORCPT ); Wed, 8 Apr 2020 01:06:18 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:52768 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726769AbgDHFFp (ORCPT ); Wed, 8 Apr 2020 01:05:45 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 038543dJ012900; Wed, 8 Apr 2020 05:05:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=kgW9iCiG5LItARQCVcpvzlvJOL8EYizS/WgTMkqF5GQ=; b=B6xflBgt96z3OrC1orNU/7h9/R5v5aKEPszgWArvG/YKVpuevq436ajAtgFfMwwdSjz0 qZ7szdFGT7K5dCH8fbEDuMV7ugSL0fmHYKseM8dvx37f8AQsc3UsdMHIQF7jtDV3zk7R 6h1PtbnmbtR7f5lhdQjlpxnf475s3S1o8AunH6alLl8VdqfsQrWo5obLyFLDKU9XF/2Y nf2f6XasqEBFPrCnC1Z3c7eAYzDdMYy7rMkts1JewP/sftLhmdV22oay8ytjJwHUm1O+ gKqGihxgS0ZEkpoFHmiB7Z9NlvPtwIpjjHwX1F9KKqC0ifLje9IrxufEFjigj0nILJUn BA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 3091m390y4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:32 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03851VPa100593; Wed, 8 Apr 2020 05:05:31 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3020.oracle.com with ESMTP id 3091m2hvn7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:31 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03855UrV007479; Wed, 8 Apr 2020 05:05:30 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:30 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 23/26] x86/kvm: Add worker to trigger runtime patching Date: Tue, 7 Apr 2020 22:03:20 -0700 Message-Id: <20200408050323.4237-24-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 mlxscore=0 malwarescore=0 spamscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 adultscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 priorityscore=1501 clxscore=1015 bulkscore=0 phishscore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Make __pv_init_lock_hash() conditional on either paravirt spinlocks being enabled (via kvm_pv_spinlock()) or if paravirt spinlocks might get enabled (runtime patching via CONFIG_PARAVIRT_RUNTIME.) Also add a handler for CPUID reprobe which can trigger this patching. Signed-off-by: Ankur Arora --- arch/x86/kernel/kvm.c | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 31f5ecfd3907..1cb7eab805a6 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -35,6 +35,7 @@ #include #include #include +#include static int kvmapf = 1; @@ -909,12 +910,15 @@ void __init kvm_spinlock_init(void) if (num_possible_cpus() == 1) return; - if (!kvm_pv_spinlock()) - return; - - __pv_init_lock_hash(); + /* + * Allocate if pv_spinlocks are enabled or if we might + * end up patching them in later. + */ + if (kvm_pv_spinlock() || IS_ENABLED(CONFIG_PARAVIRT_RUNTIME)) + __pv_init_lock_hash(); } - +#else /* !CONFIG_PARAVIRT_SPINLOCKS */ +static inline bool kvm_pv_spinlock(void) { return false; } #endif /* CONFIG_PARAVIRT_SPINLOCKS */ #ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL @@ -952,3 +956,23 @@ void arch_haltpoll_disable(unsigned int cpu) } EXPORT_SYMBOL_GPL(arch_haltpoll_disable); #endif + +#ifdef CONFIG_PARAVIRT_RUNTIME +void kvm_trigger_reprobe_cpuid(struct work_struct *work) +{ + mutex_lock(&text_mutex); + + paravirt_stage_zero(); + + kvm_pv_steal_clock(); + kvm_pv_tlb(); + paravirt_runtime_patch(false); + + paravirt_stage_zero(); + + kvm_pv_spinlock(); + paravirt_runtime_patch(true); + + mutex_unlock(&text_mutex); +} +#endif /* CONFIG_PARAVIRT_RUNTIME */ From patchwork Wed Apr 8 05:03:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479425 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 315A592C for ; Wed, 8 Apr 2020 05:07:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F06832082F for ; Wed, 8 Apr 2020 05:07:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="O56kEp6C" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727522AbgDHFHq (ORCPT ); Wed, 8 Apr 2020 01:07:46 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:39354 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727331AbgDHFHq (ORCPT ); Wed, 8 Apr 2020 01:07:46 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 038531WJ179569; Wed, 8 Apr 2020 05:07:33 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=KVN+YCf5xOl2+mEkv4Kw5562VW/mACAX0RWsY+6VPF0=; b=O56kEp6CQgL39BRDHove3tAhSFX2lvkADbjJWBm5rDMDLICBVmPc4d2u9AFCkVA4D9DC 7KcyKyyjfEqQ3Ed+a+i5xCgUVsR4Rtx0vBZ03hEmvRQgvk+ls6Y9CfLfzTtmBGGeZ4s7 n4TC4mYp78RaJwzlLr/v/XXYr3IKpqAJbHiaU2pIq0cTPiCU4/wndYJK9TCL1Ql918Ve WLq+jXywMB3Za0bcGyXnnv+gtAvoAOR/3xn9qZ1TFwLyJ+N9SLQ9As8kJxdezHkkIEnm fNjALKL/oE1JPgg/mXLr5Z/2HjqsZQOhD3HzqnuSrVOkO31+wXlUzlKFbCyLN8KUKaEh 0g== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2120.oracle.com with ESMTP id 3091mnh1b2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:07:33 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03852gtx148283; Wed, 8 Apr 2020 05:05:32 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3030.oracle.com with ESMTP id 3091kgj7pg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:32 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03855W7k030624; Wed, 8 Apr 2020 05:05:32 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:31 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 24/26] x86/kvm: Support dynamic CPUID hints Date: Tue, 7 Apr 2020 22:03:21 -0700 Message-Id: <20200408050323.4237-25-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 bulkscore=0 suspectscore=0 spamscore=0 malwarescore=0 adultscore=0 phishscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 bulkscore=0 adultscore=0 impostorscore=0 phishscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Change in the state of a KVM hint like KVM_HINTS_REALTIME can lead to significant performance impact. Given that the hint might not be stable across the lifetime of a guest, dynamic hints allow the host to inform the guest if the hint changes. Do this via KVM CPUID leaf in %ecx. If the guest has registered a callback via MSR_KVM_HINT_VECTOR, the hint change is notified to it by means of a callback triggered via vcpu ioctl KVM_CALLBACK. Signed-off-by: Ankur Arora --- The callback vector is currently tied in with the hint notification and can (should) be made more generic such that we could deliver arbitrary callbacks on it. One use might be for TSC frequency switching notifications support for emulated Hyper-V guests. --- Documentation/virt/kvm/api.rst | 17 ++++++++++++ Documentation/virt/kvm/cpuid.rst | 9 +++++-- arch/x86/include/asm/kvm_host.h | 6 +++++ arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 39 ++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 4 +++ 7 files changed, 77 insertions(+), 3 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index efbbe570aa9b..40a9b22d6979 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -4690,6 +4690,17 @@ KVM_PV_VM_VERIFY Verify the integrity of the unpacked image. Only if this succeeds, KVM is allowed to start protected VCPUs. +4.126 KVM_CALLBACK +------------------ + +:Capability: KVM_CAP_CALLBACK +:Architectures: x86 +:Type: vcpu ioctl +:Parameters: none +:Returns: 0 on success, -1 on error + +Queues a callback on the guess's vcpu if a callback has been regisered. + 5. The kvm_run structure ======================== @@ -6109,3 +6120,9 @@ KVM can therefore start protected VMs. This capability governs the KVM_S390_PV_COMMAND ioctl and the KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected guests when the state change is invalid. + +8.24 KVM_CAP_CALLBACK + +Architectures: x86_64 + +This capability indicates that the ioctl KVM_CALLBACK is available. diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst index 01b081f6e7ea..5a997c9e74c0 100644 --- a/Documentation/virt/kvm/cpuid.rst +++ b/Documentation/virt/kvm/cpuid.rst @@ -86,6 +86,9 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit before using paravirtualized sched yield. +KVM_FEATURE_DYNAMIC_HINTS 14 guest handles feature hints + changing under it. + KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side per-cpu warps are expeced in kvmclock @@ -93,9 +96,11 @@ KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side :: - edx = an OR'ed group of (1 << flag) + ecx, edx = an OR'ed group of (1 << flag) -Where ``flag`` here is defined as below: +Where the ``flag`` in ecx is currently applicable hints, and ``flag`` in +edx is the union of all hints ever provided to the guest, both drawn from +the set listed below: ================== ============ ================================= flag value meaning diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 42a2d0d3984a..4f061550274d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -723,6 +723,8 @@ struct kvm_vcpu_arch { bool nmi_injected; /* Trying to inject an NMI this entry */ bool smi_pending; /* SMI queued after currently running handler */ + bool callback_pending; /* Callback queued after running handler */ + struct kvm_mtrr mtrr_state; u64 pat; @@ -982,6 +984,10 @@ struct kvm_arch { struct kvm_pmu_event_filter *pmu_event_filter; struct task_struct *nx_lpage_recovery_thread; + + struct { + u8 vector; + } callback; }; struct kvm_vm_stat { diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 2a8e0b6b9805..bf016e232f2f 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -31,6 +31,7 @@ #define KVM_FEATURE_PV_SEND_IPI 11 #define KVM_FEATURE_POLL_CONTROL 12 #define KVM_FEATURE_PV_SCHED_YIELD 13 +#define KVM_FEATURE_DYNAMIC_HINTS 14 #define KVM_HINTS_REALTIME 0 @@ -50,6 +51,7 @@ #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 #define MSR_KVM_POLL_CONTROL 0x4b564d05 +#define MSR_KVM_HINT_VECTOR 0x4b564d06 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 901cd1fdecd9..db6a4c4d9430 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -712,7 +712,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) | (1 << KVM_FEATURE_PV_SEND_IPI) | (1 << KVM_FEATURE_POLL_CONTROL) | - (1 << KVM_FEATURE_PV_SCHED_YIELD); + (1 << KVM_FEATURE_PV_SCHED_YIELD) | + (1 << KVM_FEATURE_DYNAMIC_HINTS); if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b8124b562dea..838d033bf5ba 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1282,6 +1282,7 @@ static const u32 emulated_msrs_all[] = { MSR_K7_HWCR, MSR_KVM_POLL_CONTROL, + MSR_KVM_HINT_VECTOR, }; static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)]; @@ -2910,7 +2911,15 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) vcpu->arch.msr_kvm_poll_control = data; break; + case MSR_KVM_HINT_VECTOR: { + u8 vector = (u8)data; + if ((u64)data > 0xffUL) + return 1; + + vcpu->kvm->arch.callback.vector = vector; + break; + } case MSR_IA32_MCG_CTL: case MSR_IA32_MCG_STATUS: case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1: @@ -3156,6 +3165,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_KVM_POLL_CONTROL: msr_info->data = vcpu->arch.msr_kvm_poll_control; break; + case MSR_KVM_HINT_VECTOR: + msr_info->data = vcpu->kvm->arch.callback.vector; + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: @@ -3373,6 +3385,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_GET_MSR_FEATURES: case KVM_CAP_MSR_PLATFORM_INFO: case KVM_CAP_EXCEPTION_PAYLOAD: + case KVM_CAP_CALLBACK: r = 1; break; case KVM_CAP_SYNC_REGS: @@ -3721,6 +3734,20 @@ static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, return 0; } +static int kvm_vcpu_ioctl_callback(struct kvm_vcpu *vcpu) +{ + /* + * Has the guest setup a callback? + */ + if (vcpu->kvm->arch.callback.vector) { + vcpu->arch.callback_pending = true; + kvm_make_request(KVM_REQ_EVENT, vcpu); + return 0; + } else { + return -EINVAL; + } +} + static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu) { kvm_inject_nmi(vcpu); @@ -4611,6 +4638,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = 0; break; } + case KVM_CALLBACK: { + r = kvm_vcpu_ioctl_callback(vcpu); + break; + } default: r = -EINVAL; } @@ -7737,6 +7768,14 @@ static int inject_pending_event(struct kvm_vcpu *vcpu) --vcpu->arch.nmi_pending; vcpu->arch.nmi_injected = true; kvm_x86_ops.set_nmi(vcpu); + } else if (vcpu->arch.callback_pending) { + if (kvm_x86_ops.interrupt_allowed(vcpu)) { + vcpu->arch.callback_pending = false; + kvm_queue_interrupt(vcpu, + vcpu->kvm->arch.callback.vector, + false); + kvm_x86_ops.set_irq(vcpu); + } } else if (kvm_cpu_has_injectable_intr(vcpu)) { /* * Because interrupts can be injected asynchronously, we are diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 428c7dde6b4b..5401c056742c 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1017,6 +1017,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_S390_VCPU_RESETS 179 #define KVM_CAP_S390_PROTECTED 180 #define KVM_CAP_PPC_SECURE_GUEST 181 +#define KVM_CAP_CALLBACK 182 #ifdef KVM_CAP_IRQ_ROUTING @@ -1518,6 +1519,9 @@ struct kvm_pv_cmd { /* Available with KVM_CAP_S390_PROTECTED */ #define KVM_S390_PV_COMMAND _IOWR(KVMIO, 0xc5, struct kvm_pv_cmd) +/* Available with KVM_CAP_CALLBACK */ +#define KVM_CALLBACK _IO(KVMIO, 0xc6) + /* Secure Encrypted Virtualization command */ enum sev_cmd_id { /* Guest initialization commands */ From patchwork Wed Apr 8 05:03:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479331 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E0E291894 for ; Wed, 8 Apr 2020 05:05:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BFDD92082F for ; Wed, 8 Apr 2020 05:05:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="PlrYMSP2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726819AbgDHFFs (ORCPT ); Wed, 8 Apr 2020 01:05:48 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:38234 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726798AbgDHFFr (ORCPT ); Wed, 8 Apr 2020 01:05:47 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03852xa5179539; Wed, 8 Apr 2020 05:05:36 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=7Qqvi8ilezudja6Xt6TMpZeT8QAYzWIZNCTpskFpIKg=; b=PlrYMSP2h6k1ZitVeuybm/+AYAE+sB8vRMB0+kQOvow5oszNXceJgjEAAvFiIlmk8dfT 43T9eS07CTuXT5UodAkvV/QFJS1fQXtS2c29x/cctBkqEX6jmAfoJcWHJC0DTzKAUQWR iUbfgYI6KmSYNMk70LnKGdaOG/+UOyrjPqDPs2t9xMboURdOeol/NYZGkwptOCiG14T0 sbJWNxq07XmOG8CdPD29AbXQVHyC/cFAqZVn9RBzyVvGYcPANFGGd78XxPiAgZ4dOBxM VvAY06D+9EnXtAWkr8uALBZMh621nLJ+hx8OYkjVKw5z/RXOjk5BS3CqcxHaLinqXVoO jQ== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2120.oracle.com with ESMTP id 3091mnh158-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:36 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03852gSb148301; Wed, 8 Apr 2020 05:05:35 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3030.oracle.com with ESMTP id 3091kgj7qp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:35 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03855Xes022191; Wed, 8 Apr 2020 05:05:33 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:33 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora Subject: [RFC PATCH 25/26] x86/kvm: Guest support for dynamic hints Date: Tue, 7 Apr 2020 22:03:22 -0700 Message-Id: <20200408050323.4237-26-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 bulkscore=0 suspectscore=0 spamscore=0 malwarescore=0 adultscore=0 phishscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 bulkscore=0 adultscore=0 impostorscore=0 phishscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If the hypervisor supports KVM_FEATURE_DYNAMIC_HINTS, then register a callback vector (currently chosen to be HYPERVISOR_CALLBACK_VECTOR.) The callback triggers on a change in the active hints which are are exported via KVM CPUID in %ecx. Trigger re-evaluation of KVM_HINTS based on change in their active status. Signed-off-by: Ankur Arora --- arch/x86/Kconfig | 1 + arch/x86/entry/entry_64.S | 5 +++ arch/x86/include/asm/kvm_para.h | 7 ++++ arch/x86/kernel/kvm.c | 58 ++++++++++++++++++++++++++++++--- include/asm-generic/kvm_para.h | 4 +++ include/linux/kvm_para.h | 5 +++ 6 files changed, 76 insertions(+), 4 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index e0629558b6b5..23b239d184fc 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -810,6 +810,7 @@ config KVM_GUEST select PARAVIRT_CLOCK select ARCH_CPUIDLE_HALTPOLL select PARAVIRT_RUNTIME + select X86_HV_CALLBACK_VECTOR default y ---help--- This option enables various optimizations for running under the KVM diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 0e9504fabe52..96b2a243c54f 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1190,6 +1190,11 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \ acrn_hv_callback_vector acrn_hv_vector_handler #endif +#if IS_ENABLED(CONFIG_KVM_GUEST) +apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \ + kvm_callback_vector kvm_do_callback +#endif + idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=IST_INDEX_DB ist_offset=DB_STACK_OFFSET idtentry int3 do_int3 has_error_code=0 create_gap=1 idtentry stack_segment do_stack_segment has_error_code=1 diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 9b4df6eaa11a..5a7ca5639c2e 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -88,11 +88,13 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1, bool kvm_para_available(void); unsigned int kvm_arch_para_features(void); unsigned int kvm_arch_para_hints(void); +unsigned int kvm_arch_para_active_hints(void); void kvm_async_pf_task_wait(u32 token, int interrupt_kernel); void kvm_async_pf_task_wake(u32 token); u32 kvm_read_and_reset_pf_reason(void); extern void kvm_disable_steal_time(void); void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address); +void kvm_callback_vector(struct pt_regs *regs); #ifdef CONFIG_PARAVIRT_SPINLOCKS void __init kvm_spinlock_init(void); @@ -121,6 +123,11 @@ static inline unsigned int kvm_arch_para_hints(void) return 0; } +static inline unsigned int kvm_arch_para_active_hints(void) +{ + return 0; +} + static inline u32 kvm_read_and_reset_pf_reason(void) { return 0; diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 1cb7eab805a6..163b7a7ec5f9 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -25,6 +25,8 @@ #include #include #include +#include +#include #include #include #include @@ -438,7 +440,7 @@ static void __init sev_map_percpu_data(void) static bool pv_tlb_flush_supported(void) { return (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) && - !kvm_para_has_hint(KVM_HINTS_REALTIME) && + !kvm_para_has_active_hint(KVM_HINTS_REALTIME) && kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)); } @@ -463,7 +465,7 @@ static bool pv_ipi_supported(void) static bool pv_sched_yield_supported(void) { return (kvm_para_has_feature(KVM_FEATURE_PV_SCHED_YIELD) && - !kvm_para_has_hint(KVM_HINTS_REALTIME) && + !kvm_para_has_active_hint(KVM_HINTS_REALTIME) && kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)); } @@ -568,7 +570,7 @@ static void kvm_smp_send_call_func_ipi(const struct cpumask *mask) static void __init kvm_smp_prepare_cpus(unsigned int max_cpus) { native_smp_prepare_cpus(max_cpus); - if (kvm_para_has_hint(KVM_HINTS_REALTIME)) + if (kvm_para_has_active_hint(KVM_HINTS_REALTIME)) static_branch_disable(&virt_spin_lock_key); } @@ -654,6 +656,13 @@ static bool kvm_pv_tlb(void) return cond; } +#ifdef CONFIG_PARAVIRT_RUNTIME +static bool has_dynamic_hint; +static void __init kvm_register_callback_vector(void); +#else +#define has_dynamic_hint false +#endif /* CONFIG_PARAVIRT_RUNTIME */ + static void __init kvm_guest_init(void) { int i; @@ -674,6 +683,12 @@ static void __init kvm_guest_init(void) if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) apic_set_eoi_write(kvm_guest_apic_eoi_write); + if (IS_ENABLED(CONFIG_PARAVIRT_RUNTIME) && + kvm_para_has_feature(KVM_FEATURE_DYNAMIC_HINTS)) { + kvm_register_callback_vector(); + has_dynamic_hint = true; + } + #ifdef CONFIG_SMP smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus; smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu; @@ -729,12 +744,27 @@ unsigned int kvm_arch_para_features(void) return cpuid_eax(kvm_cpuid_base() | KVM_CPUID_FEATURES); } +/* + * Universe of hints that's ever been given to this guest. + */ unsigned int kvm_arch_para_hints(void) { return cpuid_edx(kvm_cpuid_base() | KVM_CPUID_FEATURES); } EXPORT_SYMBOL_GPL(kvm_arch_para_hints); +/* + * Currently active set of hints. Reading can race with modifications. + */ +unsigned int kvm_arch_para_active_hints(void) +{ + if (has_dynamic_hint) + return cpuid_ecx(kvm_cpuid_base() | KVM_CPUID_FEATURES); + else + return kvm_arch_para_hints(); +} +EXPORT_SYMBOL_GPL(kvm_arch_para_active_hints); + static uint32_t __init kvm_detect(void) { return kvm_cpuid_base(); @@ -878,7 +908,7 @@ static inline bool kvm_para_lock_ops(void) { /* Does host kernel support KVM_FEATURE_PV_UNHALT? */ return kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) && - !kvm_para_has_hint(KVM_HINTS_REALTIME); + !kvm_para_has_active_hint(KVM_HINTS_REALTIME); } static bool kvm_pv_spinlock(void) @@ -975,4 +1005,24 @@ void kvm_trigger_reprobe_cpuid(struct work_struct *work) mutex_unlock(&text_mutex); } + +static DECLARE_WORK(trigger_reprobe, kvm_trigger_reprobe_cpuid); + +void __irq_entry kvm_do_callback(struct pt_regs *regs) +{ + struct pt_regs *old_regs = set_irq_regs(regs); + + irq_enter(); + inc_irq_stat(irq_hv_callback_count); + + schedule_work(&trigger_reprobe); + irq_exit(); + set_irq_regs(old_regs); +} + +static void __init kvm_register_callback_vector(void) +{ + alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, kvm_callback_vector); + wrmsrl(MSR_KVM_HINT_VECTOR, HYPERVISOR_CALLBACK_VECTOR); +} #endif /* CONFIG_PARAVIRT_RUNTIME */ diff --git a/include/asm-generic/kvm_para.h b/include/asm-generic/kvm_para.h index 728e5c5706c4..4a575299ad62 100644 --- a/include/asm-generic/kvm_para.h +++ b/include/asm-generic/kvm_para.h @@ -24,6 +24,10 @@ static inline unsigned int kvm_arch_para_hints(void) return 0; } +static inline unsigned int kvm_arch_para_active_hints(void) +{ + return 0; +} static inline bool kvm_para_available(void) { return false; diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h index f23b90b02898..c98d3944d25a 100644 --- a/include/linux/kvm_para.h +++ b/include/linux/kvm_para.h @@ -14,4 +14,9 @@ static inline bool kvm_para_has_hint(unsigned int feature) { return !!(kvm_arch_para_hints() & (1UL << feature)); } + +static inline bool kvm_para_has_active_hint(unsigned int feature) +{ + return !!(kvm_arch_para_active_hints() & BIT(feature)); +} #endif /* __LINUX_KVM_PARA_H */ From patchwork Wed Apr 8 05:03:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11479337 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 999DB112C for ; Wed, 8 Apr 2020 05:06:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 773042082F for ; Wed, 8 Apr 2020 05:06:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="y3BkAb4c" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726860AbgDHFFz (ORCPT ); Wed, 8 Apr 2020 01:05:55 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:52818 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726803AbgDHFFt (ORCPT ); Wed, 8 Apr 2020 01:05:49 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03854Mlq012951; Wed, 8 Apr 2020 05:05:36 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=81g/IY4OEO7xOgYmvx3qGX7bGeeG6xG4uYi0m0I9NiU=; b=y3BkAb4cOA19nTsI29w6A52wve6TE0t+BHCi9ZdjwQtnGqWvgh7IabI/Dix1L+vltL6+ mntY0r37PP1IWDBBTy+2YBtG9v4se2ceir0lqHohVla5X8su6jF8zFNfMLzZHJqbyeU1 P7wxRQe4TE0yGiwyi12K4n95THbgAXKolLnwQ3PAUfmZ/K9Uh2xPVQCki4weMbQtx+gK fM3gayV6jAxZK8ngqPZw4OgFn+fyHMWVmJgWb8KULlqVOON31XdqW9E6VL3Q1lXgInRw vPrlF4p/kqvG774Zn14qeO8bBeToxilc59nm88kTTuGYfJpVwnNZlH0PLwkmJY8IM9Md xA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 3091m390y9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:36 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03853K3u158680; Wed, 8 Apr 2020 05:05:36 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 3091m01g8r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Apr 2020 05:05:36 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03855YJq022238; Wed, 8 Apr 2020 05:05:34 GMT Received: from monad.ca.oracle.com (/10.156.75.81) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 07 Apr 2020 22:05:34 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora , Joao Martins Subject: [RFC PATCH 26/26] x86/kvm: Add hint change notifier for KVM_HINT_REALTIME Date: Tue, 7 Apr 2020 22:03:23 -0700 Message-Id: <20200408050323.4237-27-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> References: <20200408050323.4237-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 malwarescore=0 mlxlogscore=999 phishscore=0 spamscore=0 adultscore=0 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9584 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 adultscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 priorityscore=1501 clxscore=1015 bulkscore=0 phishscore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004080037 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add a blocking notifier that triggers when the host sends a hint change notification. Suggested-by: Joao Martins Signed-off-by: Ankur Arora --- arch/x86/include/asm/kvm_para.h | 10 ++++++++++ arch/x86/kernel/kvm.c | 16 ++++++++++++++++ include/asm-generic/kvm_para.h | 8 ++++++++ 3 files changed, 34 insertions(+) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 5a7ca5639c2e..54c3c7a3225e 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -2,6 +2,7 @@ #ifndef _ASM_X86_KVM_PARA_H #define _ASM_X86_KVM_PARA_H +#include #include #include #include @@ -96,6 +97,9 @@ extern void kvm_disable_steal_time(void); void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address); void kvm_callback_vector(struct pt_regs *regs); +void kvm_realtime_notifier_register(struct notifier_block *nb); +void kvm_realtime_notifier_unregister(struct notifier_block *nb); + #ifdef CONFIG_PARAVIRT_SPINLOCKS void __init kvm_spinlock_init(void); #else /* !CONFIG_PARAVIRT_SPINLOCKS */ @@ -137,6 +141,14 @@ static inline void kvm_disable_steal_time(void) { return; } + +static inline void kvm_realtime_notifier_register(struct notifier_block *nb) +{ +} + +static inline void kvm_realtime_notifier_unregister(struct notifier_block *nb) +{ +} #endif #endif /* _ASM_X86_KVM_PARA_H */ diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 163b7a7ec5f9..35ba4a837027 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -951,6 +951,20 @@ void __init kvm_spinlock_init(void) static inline bool kvm_pv_spinlock(void) { return false; } #endif /* CONFIG_PARAVIRT_SPINLOCKS */ +static BLOCKING_NOTIFIER_HEAD(realtime_notifier); + +void kvm_realtime_notifier_register(struct notifier_block *nb) +{ + blocking_notifier_chain_register(&realtime_notifier, nb); +} +EXPORT_SYMBOL_GPL(kvm_realtime_notifier_register); + +void kvm_realtime_notifier_unregister(struct notifier_block *nb) +{ + blocking_notifier_chain_unregister(&realtime_notifier, nb); +} +EXPORT_SYMBOL_GPL(kvm_realtime_notifier_unregister); + #ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL static void kvm_disable_host_haltpoll(void *i) @@ -1004,6 +1018,8 @@ void kvm_trigger_reprobe_cpuid(struct work_struct *work) paravirt_runtime_patch(true); mutex_unlock(&text_mutex); + + blocking_notifier_call_chain(&realtime_notifier, 0, NULL); } static DECLARE_WORK(trigger_reprobe, kvm_trigger_reprobe_cpuid); diff --git a/include/asm-generic/kvm_para.h b/include/asm-generic/kvm_para.h index 4a575299ad62..d443531b49ac 100644 --- a/include/asm-generic/kvm_para.h +++ b/include/asm-generic/kvm_para.h @@ -33,4 +33,12 @@ static inline bool kvm_para_available(void) return false; } +static inline void kvm_realtime_notifier_register(struct notifier_block *nb) +{ +} + +static inline void kvm_realtime_notifier_unregister(struct notifier_block *nb) +{ +} + #endif