From patchwork Tue Oct 10 08:35:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13415011 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89777CD68FE for ; Tue, 10 Oct 2023 08:35:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442923AbjJJIfm (ORCPT ); Tue, 10 Oct 2023 04:35:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442897AbjJJIfg (ORCPT ); Tue, 10 Oct 2023 04:35:36 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 990BA9F; Tue, 10 Oct 2023 01:35:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696926934; x=1728462934; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XRUCF/P5uKgDye69B2VNPFxzhkCFQCyWEdXoWuKd6KA=; b=A/4c6uBGxvJpTwMiDq1ASuH8uLoKBG+XDxMXAxfMBWGDeGTZaKoxzNIs ifRhrWEMMoK+vYRC8eF5bFd95nEz5SXoozqeq2nMmGDsxOmPpWTVS5jk1 3aIO1eF+XBUMYs6X3Qn5t4Hhj6ePD55RajVLCJepyVOOpCzg2Xf8N5SzO to2Dy8lz7ErcEIUUfrxD0ng8upzQkgLamx1PxmMdo3+hgTt3IAwup1ymn md5dCoBvd7DaLqxAcczdc6a24mWidEEF8Hwmrq1mDp7SuD/rmeBbFiLsQ MCU50FbCHgoR3aucDeyo2zpus4sgsLn8nQxnCdR7IlTIK4F62nUMRGtgI w==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="387176588" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="387176588" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="730001327" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="730001327" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:32 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , linux-coco@lists.linux.dev, Chao Peng Subject: [PATCH 01/12] x86/mce: Fix hw MCE injection feature detection Date: Tue, 10 Oct 2023 01:35:09 -0700 Message-Id: <23c6fa20777498bccd486aedc435eef9af174748.1696926843.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata When initializing x86 MCE injection framework, it checks if hardware mce injection is available or not. When it's not available on AMD, set the boolean variable to false to not use it. The variable is on by default and the feature is AMD specific based on the code. Because the variable is default on, it is true on Intel platform (probably on other non-AMD x86 platform). It results in unchecked msr access of MSR_K7_HWCR=0xc0010015 when injecting MCE on Intel platform. (Probably on other x86 platform.) Make the variable of by default, and set the variable on when the hardware feature is usable. The procedure to produce on Intel platform. echo hw > /sys/kernel/debug/mce-inject/flags This succeeds. set other MCE parameters if necessary. echo 0 > /sys/kernel/debug/mce-inject/bank This triggers mce injection. The kernel accesses MSR_K7_HWCR=0xc0010015 resulting unchecked MSR access. Fixes: 891e465a1bd8 ("x86/mce: Check whether writes to MCA_STATUS are getting ignored") Signed-off-by: Isaku Yamahata --- arch/x86/kernel/cpu/mce/inject.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c index 4d8d4bcf915d..881898a1d2f4 100644 --- a/arch/x86/kernel/cpu/mce/inject.c +++ b/arch/x86/kernel/cpu/mce/inject.c @@ -33,7 +33,7 @@ #include "internal.h" -static bool hw_injection_possible = true; +static bool hw_injection_possible; /* * Collect all the MCi_XXX settings @@ -732,6 +732,7 @@ static void check_hw_inj_possible(void) if (!cpu_feature_enabled(X86_FEATURE_SMCA)) return; + hw_injection_possible = true; cpu = get_cpu(); for (bank = 0; bank < MAX_NR_BANKS; ++bank) { From patchwork Tue Oct 10 08:35:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13415012 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10D3ECD690A for ; Tue, 10 Oct 2023 08:35:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442944AbjJJIfr (ORCPT ); Tue, 10 Oct 2023 04:35:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442904AbjJJIfg (ORCPT ); Tue, 10 Oct 2023 04:35:36 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3700A9; Tue, 10 Oct 2023 01:35:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696926934; x=1728462934; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=P/4mWGyVldhGd+8X9JavsKnqWkSdr5BRdzY15fml+QM=; b=TXtGdIVmyIfDzT0j0xxDxgAhXbLmgFGAG7CP7kOCvDdScSpr+yqifDdn G77awRNEGPB1tib+/yYoa4S/36LoYe2EIfpY98uWGrd/f/4N4rse+jgT4 JVTvy6ZB1ktgkHc4N/CdRBd0QOK3r6/9auDfDBbJGaif73i5g3uxKDTvK QrfyvonDpMftCyDQrTPYCirIqyeO6pycWf0mMdI9ghOno4QcFDRDfLxc3 JL1eiKYItbu63b7nX9yIIpJ/a72jQwRAf0O5qNt30irQBJTKIYq93jKL0 6/+ISSEtMpfG9EzmqQXrK9iOeCQCWwQc1c151b/XyRh/Y67EJ4ChqQTvK A==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="387176601" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="387176601" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="730001332" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="730001332" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:32 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , linux-coco@lists.linux.dev, Chao Peng Subject: [PATCH 02/12] X86/mce/inject: Add mcgstatus for mce-inject debugfs Date: Tue, 10 Oct 2023 01:35:10 -0700 Message-Id: <062049a8ebfaa7bf14fe22ddb5e48f5b660e3332.1696926843.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata To test KVM x86 to inject machine check with memory failure, it wants to to set LMCE_S (local machine check exception signaled) for memory failure injection test. The current code sets mcgstatus based on the other values and can't set LMCE_S flag. Add mcgstatus entry to allow to set mcgstatus value. Signed-off-by: Isaku Yamahata --- arch/x86/kernel/cpu/mce/inject.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c index 881898a1d2f4..461858ae18f9 100644 --- a/arch/x86/kernel/cpu/mce/inject.c +++ b/arch/x86/kernel/cpu/mce/inject.c @@ -72,6 +72,7 @@ static int inj_##reg##_set(void *data, u64 val) \ return 0; \ } +MCE_INJECT_SET(mcgstatus); MCE_INJECT_SET(status); MCE_INJECT_SET(misc); MCE_INJECT_SET(addr); @@ -86,12 +87,14 @@ static int inj_##reg##_get(void *data, u64 *val) \ return 0; \ } +MCE_INJECT_GET(mcgstatus); MCE_INJECT_GET(status); MCE_INJECT_GET(misc); MCE_INJECT_GET(addr); MCE_INJECT_GET(synd); MCE_INJECT_GET(ipid); +DEFINE_SIMPLE_ATTRIBUTE(mcgstatus_fops, inj_mcgstatus_get, inj_mcgstatus_set, "%llx\n"); DEFINE_SIMPLE_ATTRIBUTE(status_fops, inj_status_get, inj_status_set, "%llx\n"); DEFINE_SIMPLE_ATTRIBUTE(misc_fops, inj_misc_get, inj_misc_set, "%llx\n"); DEFINE_SIMPLE_ATTRIBUTE(addr_fops, inj_addr_get, inj_addr_set, "%llx\n"); @@ -679,6 +682,9 @@ static const char readme_msg[] = "\t APIC interrupt handler to handle the error. \n" "\n" "ipid:\t IPID (AMD-specific)\n" +"\n" +"mcgstatus:\t Set MCG_STATUS: the bits in that MSR describes the current state\n" +"\t of the processor after the MCE.\n" "\n"; static ssize_t @@ -706,6 +712,8 @@ static struct dfs_node { { .name = "bank", .fops = &bank_fops, .perm = S_IRUSR | S_IWUSR }, { .name = "flags", .fops = &flags_fops, .perm = S_IRUSR | S_IWUSR }, { .name = "cpu", .fops = &extcpu_fops, .perm = S_IRUSR | S_IWUSR }, + { .name = "mcgstatus", .fops = &mcgstatus_fops, + .perm = S_IRUSR | S_IWUSR }, { .name = "README", .fops = &readme_fops, .perm = S_IRUSR | S_IRGRP | S_IROTH }, }; From patchwork Tue Oct 10 08:35:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13415010 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 699E4CD68FE for ; Tue, 10 Oct 2023 08:35:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442919AbjJJIfj (ORCPT ); Tue, 10 Oct 2023 04:35:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442898AbjJJIfg (ORCPT ); Tue, 10 Oct 2023 04:35:36 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1450DA4; Tue, 10 Oct 2023 01:35:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696926934; x=1728462934; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=m7pCDbhKS6+u2jcznAubfaRxlAgJtSYbB0zCkXhtCcE=; b=keJcEqYFaOJEYzUkdJs8Jft5Ct6gYDwV5Lf18cWLvqtj64Kif8BUg2r2 1dEaf9aaXoipwdTjWVzuXgdfobvcwluduObiJeCQoLSk3p151Kd0C2iMg 6vzMDb+Rh+CBdKeE0F0d6RfRqxJ08xEYm9VFMmGFj2zxcV9MPdrskqYoU 2srdz334BLAsk0fpy1PBsE1A+1OwIfVGqsg8Wr3Is1gaV4/Gbeg0h5eTY Q3NA/BDiabkx5f0HO6/ec9HBbokMe3edoknx9tN4qEi1Vie1f+64QI7hw SLKBWbUSf4IbLRsfWduakHttw1NZtQWtgKyBdiy8mhzx9lBm8/9t9ymJh w==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="363689792" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="363689792" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="1084687182" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="1084687182" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:33 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , linux-coco@lists.linux.dev, Chao Peng Subject: [PATCH 03/12] x86/mce/inject: Add notrigger entry to suppress MCE injection Date: Tue, 10 Oct 2023 01:35:11 -0700 Message-Id: <97809b68e427922948044e33599c2fc7c9f6134c.1696926843.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata The current x86 MCE injection framework injects MCE when writing to /sys/kernel/debug/mce-inject/bank. KVM wants to inject machine check on behalf of vcpu context instead of the context of writing to the bank file. Because ACPI APEI has a similar requirement and it adds /sys/kernel/debug/apei/notrigger to suppress immediate injection. By Following it, add /sys/kernel/debug/mce-inject/notrigger to suppress MCE injection. The alternative is add new value "notrigger" to /sys/kernel/debug/mce-inject/flags in addition to "sw", "hw", "df", and "th". Because it may break user space ABI, this option follow ACPI APEI error injection. Supposed usage flow: $ echo 1 > notrigger ... setup MCE values $ echo 0 > bank The last step to setup mce value to inject with MC bank 0. Originally this step injects mce. With noijnect=1, don't inject. $ echo 1 > /sys/kernel/debug/kvm/-/vcpu/mce-inject tell KVM to inject MCE in the context of vcpu. Signed-off-by: Isaku Yamahata --- arch/x86/kernel/cpu/mce/inject.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c index 461858ae18f9..88603a6c0afe 100644 --- a/arch/x86/kernel/cpu/mce/inject.c +++ b/arch/x86/kernel/cpu/mce/inject.c @@ -34,6 +34,7 @@ #include "internal.h" static bool hw_injection_possible; +static u64 notrigger; /* * Collect all the MCi_XXX settings @@ -598,6 +599,8 @@ static int inj_bank_set(void *data, u64 val) } m->bank = val; + if (notrigger) + return 0; /* * sw-only injection allows to write arbitrary values into the MCA @@ -637,6 +640,21 @@ MCE_INJECT_GET(bank); DEFINE_SIMPLE_ATTRIBUTE(bank_fops, inj_bank_get, inj_bank_set, "%llu\n"); +static int inj_notrigger_get(void *data, u64 *val) +{ + *val = notrigger; + return 0; +} + +static int inj_notrigger_set(void *data, u64 val) +{ + notrigger = val; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(notrigger_fops, inj_notrigger_get, inj_notrigger_set, + "%llx\n"); + static const char readme_msg[] = "Description of the files and their usages:\n" "\n" @@ -685,6 +703,9 @@ static const char readme_msg[] = "\n" "mcgstatus:\t Set MCG_STATUS: the bits in that MSR describes the current state\n" "\t of the processor after the MCE.\n" +"\n" +"notrigger:\t Suppress triggering the injection when set to non-zero\n" +"\t The injection is triggered by other way.\n" "\n"; static ssize_t @@ -714,6 +735,8 @@ static struct dfs_node { { .name = "cpu", .fops = &extcpu_fops, .perm = S_IRUSR | S_IWUSR }, { .name = "mcgstatus", .fops = &mcgstatus_fops, .perm = S_IRUSR | S_IWUSR }, + { .name = "notrigger", .fops = ¬rigger_fops, + .perm = S_IRUSR | S_IWUSR }, { .name = "README", .fops = &readme_fops, .perm = S_IRUSR | S_IRGRP | S_IROTH }, }; From patchwork Tue Oct 10 08:35:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13415013 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6C17CD6907 for ; Tue, 10 Oct 2023 08:35:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442950AbjJJIfu (ORCPT ); Tue, 10 Oct 2023 04:35:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442901AbjJJIfg (ORCPT ); Tue, 10 Oct 2023 04:35:36 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25887AF; Tue, 10 Oct 2023 01:35:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696926935; x=1728462935; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Oyxf6nVEdFzk2imp0lAzFXWCTZ+GLK4NVgZs/8mH1xk=; b=h/TfGsoWI3gXGMBAzsj1D3lcmGNR3wVpm5s3/r2J9cMng+ZVmkde23Zd 5b1KPFmKjNfuhUtVKUTsNpwzkaP04K+WyN8DrfrV14hmU0699oa+w2sZw qy67EJjLwpExlbRBN64vhfkaP6chvlpLLhqzb1um+Cqxb9Fj9c0ZqaWYi zReeakMEIBP5nP0ID98LAcQa2EBJsBtbdaL8mNPo9TCnbfp+hkXbvt4qy UHx2JyYkeDDxFJqrPuV4HaVm8NYhgOBVBUcwHxFCXZNrIyW3+Bl9RPeTe OrFM24RvnUL5ZW07tQAJ8F8U/eczCgZcaEo6Y9lZuXOOyW3IpGCmIAXrC g==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="363689798" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="363689798" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="1084687185" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="1084687185" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:33 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , linux-coco@lists.linux.dev, Chao Peng Subject: [PATCH 04/12] x86/mce: Move and export inject_mce() from inject.c to core.c Date: Tue, 10 Oct 2023 01:35:12 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata X86 MCE support is kernel builtin. CONFIG_X86_MCE=y or n. The x86 MCE injection framework can be kernel builtin or kernel module. CONFIG_X86_MCE_INJECT=y, m, or n. We don't want KVM depend on CONFIG_X86_MCE_INJECT=y, allow CONFIG_X86_MCE_INJECT=m for KVM mce injection. Move the necessary symbols with rename from arch/x86/kernel/cpu/mce/inject.c to arch/x86/kernel/cpu/mce/core.c and export symbols for KVM to inject MCE. Oppertunistically add lockdep_assert_held(&mce_inject_mutex) to show that injectm is protected by mce_inject_mutex. Signed-off-by: Isaku Yamahata --- arch/x86/include/asm/mce.h | 10 +++++++++ arch/x86/kernel/cpu/mce/core.c | 36 ++++++++++++++++++++++++++++++++ arch/x86/kernel/cpu/mce/inject.c | 26 +++-------------------- 3 files changed, 49 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 180b1cbfcc4e..459066ecd922 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -265,6 +265,16 @@ int mce_notify_irq(void); DECLARE_PER_CPU(struct mce, injectm); +#ifdef CONFIG_X86_MCE +void mce_inject_lock(void); +void mce_inject_unlock(void); +void mce_inject(struct mce *m); +#else +static inline void mce_inject_lock(void) {} +static inline void mce_inject_unlock(void) {} +static inline void mce_inject(struct mce *m) {} +#endif + /* Disable CMCI/polling for MCA bank claimed by firmware */ extern void mce_disable_bank(int bank); diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 6f35f724cc14..6929c3cad278 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -129,9 +129,45 @@ void mce_setup(struct mce *m) m->ppin = cpu_data(m->extcpu).ppin; m->microcode = boot_cpu_data.microcode; } +EXPORT_SYMBOL_GPL(mce_setup); DEFINE_PER_CPU(struct mce, injectm); EXPORT_PER_CPU_SYMBOL_GPL(injectm); +static DEFINE_MUTEX(mce_inject_mutex); + +void mce_inject_lock(void) +{ + mutex_lock(&mce_inject_mutex); +} +EXPORT_SYMBOL_GPL(mce_inject_lock); + +void mce_inject_unlock(void) +{ + mutex_unlock(&mce_inject_mutex); +} +EXPORT_SYMBOL_GPL(mce_inject_unlock); + +/* Update fake mce registers on current CPU. */ +void mce_inject(struct mce *m) +{ + struct mce *i = &per_cpu(injectm, m->extcpu); + + lockdep_assert_held(&mce_inject_mutex); + + /* Make sure no one reads partially written injectm */ + i->finished = 0; + mb(); + m->finished = 0; + /* First set the fields after finished */ + i->extcpu = m->extcpu; + mb(); + /* Now write record in order, finished last (except above) */ + memcpy(i, m, sizeof(struct mce)); + /* Finally activate it */ + mb(); + i->finished = 1; +} +EXPORT_SYMBOL_GPL(mce_inject); void mce_log(struct mce *m) { diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c index 88603a6c0afe..ae3efbeb78bd 100644 --- a/arch/x86/kernel/cpu/mce/inject.c +++ b/arch/x86/kernel/cpu/mce/inject.c @@ -126,25 +126,6 @@ static void setup_inj_struct(struct mce *m) m->microcode = boot_cpu_data.microcode; } -/* Update fake mce registers on current CPU. */ -static void inject_mce(struct mce *m) -{ - struct mce *i = &per_cpu(injectm, m->extcpu); - - /* Make sure no one reads partially written injectm */ - i->finished = 0; - mb(); - m->finished = 0; - /* First set the fields after finished */ - i->extcpu = m->extcpu; - mb(); - /* Now write record in order, finished last (except above) */ - memcpy(i, m, sizeof(struct mce)); - /* Finally activate it */ - mb(); - i->finished = 1; -} - static void raise_poll(struct mce *m) { unsigned long flags; @@ -176,7 +157,6 @@ static void raise_exception(struct mce *m, struct pt_regs *pregs) } static cpumask_var_t mce_inject_cpumask; -static DEFINE_MUTEX(mce_inject_mutex); static int mce_raise_notify(unsigned int cmd, struct pt_regs *regs) { @@ -245,7 +225,7 @@ static void __maybe_unused raise_mce(struct mce *m) { int context = MCJ_CTX(m->inject_flags); - inject_mce(m); + mce_inject(m); if (context == MCJ_CTX_RANDOM) return; @@ -303,9 +283,9 @@ static int mce_inject_raise(struct notifier_block *nb, unsigned long val, if (!m) return NOTIFY_DONE; - mutex_lock(&mce_inject_mutex); + mce_inject_lock(); raise_mce(m); - mutex_unlock(&mce_inject_mutex); + mce_inject_unlock(); return NOTIFY_DONE; } From patchwork Tue Oct 10 08:35:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13415015 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4F4ACD690A for ; Tue, 10 Oct 2023 08:35:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442957AbjJJIfz (ORCPT ); Tue, 10 Oct 2023 04:35:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442910AbjJJIfh (ORCPT ); Tue, 10 Oct 2023 04:35:37 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF68E97; Tue, 10 Oct 2023 01:35:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696926935; x=1728462935; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mLhP+/UTxZsFYw+hdN7w9M/CFhBlJ75IonjuoNuiyMg=; b=AepkpjWUuHBO8nzkRSSp42FRTvdQYq5YuTaufHqduPJdyFtcI8Qgc8e+ SIPkSM2dksG6HeSZUWfBaKwAC/FplAXwTtHjg7bcRIvcz3lmtVQQNM7Xp uQkG1cGQple2CYdkKya2lsIek+quecTR2TgjLymJwnUEqsRoIVpfMgNVv Q501NhCFNLTIPSl8WXcFssiXDZmAQxOKaLgQtUqF5ReOx2SvhIlRyDgkj XlpFigzFo+VSgOB4MjRiKiM9KEp/ixZo9wusahVgskHN0osdfRf0mRO5x O1N2g2z8E/azkU7dW2WVy+YdX1oDJjV1TgsaEXlSGh5oLPr3G2Six7a1Z g==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="363689804" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="363689804" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="1084687189" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="1084687189" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:34 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , linux-coco@lists.linux.dev, Chao Peng Subject: [PATCH 05/12] mm/fadvise: Add flags to inject hwpoison for posix_fadvise() Date: Tue, 10 Oct 2023 01:35:13 -0700 Message-Id: <8bdfcbdc1b04baba70c25d7f5c3394c6d73f2ff4.1696926843.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata For VM-based confidential computing (AMD SEV-SNP and Intel TDX), the KVM guest memfd effort [1] is ongoing. It allows KVM guests to use file descriptors and their offset as protected guest memory without user-space virtual address mapping. Intel TDX uses machine checks to notify the host OS/VMM that the TDX guest consumed corrupted memory. The KVM/x86 handles machine checks for guest vcpu specially. It sets up the guest so that vcpu exits from running on machine check, checks the exit reason, and manually raises the machine check by calling do_machine_check(). To test the KVM machine check path, KVM wants to poison memory based on file descriptor and its offset. The current memory poisoning is based on the physical address, /sys/kernel/debug/hwpoison/{corrupt-pfn, unpoison-pfn}, or the virtual address, MADV_HWPOISON and MADV_UNPOISON. Add new flags FADV_HWPOISON, and FADV_UNPOISON to posix_fadvise() by following MADV_HWPOISON and MADV_UNPOISON. 9893e49d64a4 ("HWPOISON: Add madvise() based injector for hardware poisoned pages v4") The possible options would be - Add FADV flags for memory poison. This patch. - introduce IOCTL specific to KVM guest memfd - introduce debugfs entries for KVM guest memfd /sys/kernel/debug/kvm/-/guest-memfd/hwpoison/ {corrupt-offset, unoison-offset}. This options follows /sys/kernel/debug/hwpoison/{corrupt-pfn, unpoison-pfn} [1] https://lore.kernel.org/all/20230914015531.1419405-1-seanjc@google.com/ KVM guest_memfd() and per-page attributes https://lore.kernel.org/all/20230921203331.3746712-1-seanjc@google.com/ [PATCH 00/13] KVM: guest_memfd fixes Signed-off-by: Isaku Yamahata --- include/uapi/linux/fadvise.h | 4 +++ mm/fadvise.c | 58 +++++++++++++++++++++++++++++++++++- 2 files changed, 61 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/fadvise.h b/include/uapi/linux/fadvise.h index 0862b87434c2..3699b4b0adcd 100644 --- a/include/uapi/linux/fadvise.h +++ b/include/uapi/linux/fadvise.h @@ -19,4 +19,8 @@ #define POSIX_FADV_NOREUSE 5 /* Data will be accessed once. */ #endif +/* Same to MADV_HWPOISON and MADV_SOFT_OFFLINE */ +#define FADV_HWPOISON 100 /* poison a page for testing */ +#define FADV_SOFT_OFFLINE 101 /* soft offline page for testing */ + #endif /* FADVISE_H_INCLUDED */ diff --git a/mm/fadvise.c b/mm/fadvise.c index 6c39d42f16dc..1f028a6e1d90 100644 --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -18,11 +18,62 @@ #include #include #include +#include #include #include "internal.h" +static int fadvise_inject_error(struct file *file, struct address_space *mapping, + loff_t offset, off_t endbyte, int advice) +{ + pgoff_t start_index, end_index, index, next; + struct folio *folio; + unsigned int shift; + unsigned long pfn; + int ret; + + if (!IS_ENABLED(CONFIG_MEMORY_FAILURE)) + return -EOPNOTSUPP; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (is_file_hugepages(file)) + shift = huge_page_shift(hstate_file(file)); + else + shift = PAGE_SHIFT; + start_index = offset >> shift; + end_index = endbyte >> shift; + + index = start_index; + while (index <= end_index) { + folio = filemap_get_folio(mapping, index); + if (IS_ERR(folio)) + return PTR_ERR(folio); + + next = folio_next_index(folio); + pfn = folio_pfn(folio); + if (advice == FADV_SOFT_OFFLINE) { + pr_info("Soft offlining pfn %#lx at file index %#lx\n", + pfn, index); + ret = soft_offline_page(pfn, MF_COUNT_INCREASED); + } else { + pr_info("Injecting memory failure for pfn %#lx at file index %#lx\n", + pfn, index); + ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED); + if (ret == -EOPNOTSUPP) + ret = 0; + } + + if (ret) + return ret; + index = next; + } + + return 0; +} + /* * POSIX_FADV_WILLNEED could set PG_Referenced, and POSIX_FADV_NOREUSE could * deactivate the pages and clear PG_Referenced. @@ -57,11 +108,13 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice) case POSIX_FADV_NOREUSE: case POSIX_FADV_DONTNEED: /* no bad return value, but ignore advice */ + return 0; + case FADV_HWPOISON: + case FADV_SOFT_OFFLINE: break; default: return -EINVAL; } - return 0; } /* @@ -170,6 +223,9 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice) } } break; + case FADV_HWPOISON: + case FADV_SOFT_OFFLINE: + return fadvise_inject_error(file, mapping, offset, endbyte, advice); default: return -EINVAL; } From patchwork Tue Oct 10 08:35:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13415014 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57983CD6907 for ; Tue, 10 Oct 2023 08:35:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442927AbjJJIfx (ORCPT ); Tue, 10 Oct 2023 04:35:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442909AbjJJIfh (ORCPT ); Tue, 10 Oct 2023 04:35:37 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F3ABB4; Tue, 10 Oct 2023 01:35:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696926936; x=1728462936; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=utIbATgtjnZQ7rhJBP1bPFT2h4Ptx/Bd1cAqmmuMA8c=; b=JmcuFyA+Yojclci3aEbuTnnJS+pEihRba9w3+47Nbe8EJ0EoUsrgm3hI 04qrAtXBGMTkZX+TjuLA+xFvwgCrrb6qZ1gvyfPcnibBZbUXZB39ktoWD uwWXMMZ9FTMzIgkbi2z8m7ZQ3Q+Tr2A5nce2zBaFc8JdDpMhkl1mcnNfi V6YRNl/7DvBCxiwWFAZTMZO3XdEpMTps7R2/hqZyxqXdqJKgWZf+TMMyI jl4I7JWRIgEnRhC8Io9T0XOIh956/iZwOJRulBqlIzUIy5OWYb68GqcnR t5JveOpmEwNdclm14+joV+TjXbf9a8Oap4UQNbUsFoM7vg0lG19DoqrR3 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="363689812" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="363689812" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="1084687193" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="1084687193" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:34 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , linux-coco@lists.linux.dev, Chao Peng Subject: [PATCH 06/12] mm/fadvise: Add FADV_MCE_INJECT flag for posix_fadvise() Date: Tue, 10 Oct 2023 01:35:14 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata For VM-based confidential computing (AMD SEV-SNP and Intel TDX), the KVM guest memfd effort [1] is ongoing. It allows KVM guests to use file descriptors and their offset as protected guest memory without user-space virtual address mapping. Intel TDX uses machine checks to notify the host OS/VMM that the TDX guest consumed corrupted memory. The KVM/x86 handles machine checks for guest vcpu specially. It sets up the guest so that vcpu exits from running on machine check, checks the exit reason, and manually raises the machine check by calling do_machine_check(). To test the KVM machine check path, KVM wants to inject an x86 machine check based on the file descriptor and offset. Add a new flag FADV_MCE_INJECT for posix_fadvise() to tell the x86 MCE injector the physical address. The x86 MCE injector can be kernel-builtin or kernel module. CONFIG_X86_MCE_INJECT=y, m, or n. Because we don't want posix_fadvise() to depend on the x86 MCE injector, add notifier for it to register. And the x86 MCE injector will register to the notifier chain. The possible options would be - Add FADV flags for memory poison. This patch. - Introduce a new IOCTL specific to KVM guest memfd - Introduce debugfs entries for KVM guest memfd like /sys/kernel/debug/kvm/-/guest-memfd/hwpoison/ {corrupt-offset, unoison-offset}. This options follows /sys/kernel/debug/hwpoison/ {corrupt-pfn, unpoison-pfn} [1] https://lore.kernel.org/all/20230914015531.1419405-1-seanjc@google.com/ KVM guest_memfd() and per-page attributes https://lore.kernel.org/all/20230921203331.3746712-1-seanjc@google.com/ [PATCH 00/13] KVM: guest_memfd fixes Signed-off-by: Isaku Yamahata --- include/linux/fs.h | 8 +++++ include/uapi/linux/fadvise.h | 1 + mm/fadvise.c | 62 ++++++++++++++++++++++++++++++++++++ 3 files changed, 71 insertions(+) diff --git a/include/linux/fs.h b/include/linux/fs.h index b528f063e8ff..c458f1e86245 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3377,5 +3377,13 @@ extern int vfs_fadvise(struct file *file, loff_t offset, loff_t len, int advice); extern int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice); +#ifdef CONFIG_X86_MCE_INJECT +void fadvise_register_mce_injector_chain(struct notifier_block *nb); +void fadvise_unregister_mce_injector_chain(struct notifier_block *nb); +#else +static inline void fadvise_register_mce_injector_chain(struct notifier_block *nb) {} +static inline void fadvise_unregister_mce_injector_chain(struct notifier_block *nb) {} +#endif + #endif /* _LINUX_FS_H */ diff --git a/include/uapi/linux/fadvise.h b/include/uapi/linux/fadvise.h index 3699b4b0adcd..90e1b5a94f37 100644 --- a/include/uapi/linux/fadvise.h +++ b/include/uapi/linux/fadvise.h @@ -22,5 +22,6 @@ /* Same to MADV_HWPOISON and MADV_SOFT_OFFLINE */ #define FADV_HWPOISON 100 /* poison a page for testing */ #define FADV_SOFT_OFFLINE 101 /* soft offline page for testing */ +#define FADV_MCE_INJECT 102 #endif /* FADVISE_H_INCLUDED */ diff --git a/mm/fadvise.c b/mm/fadvise.c index 1f028a6e1d90..8168d08f7455 100644 --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -74,6 +74,65 @@ static int fadvise_inject_error(struct file *file, struct address_space *mapping return 0; } +#ifdef CONFIG_X86_MCE_INJECT +static BLOCKING_NOTIFIER_HEAD(mce_injector_chain); + +void fadvise_register_mce_injector_chain(struct notifier_block *nb) +{ + blocking_notifier_chain_register(&mce_injector_chain, nb); +} +EXPORT_SYMBOL_GPL(fadvise_register_mce_injector_chain); + +void fadvise_unregister_mce_injector_chain(struct notifier_block *nb) +{ + blocking_notifier_chain_unregister(&mce_injector_chain, nb); +} +EXPORT_SYMBOL_GPL(fadvise_unregister_mce_injector_chain); + +static void fadvise_call_mce_injector_chain(unsigned long physaddr) +{ + blocking_notifier_call_chain(&mce_injector_chain, physaddr, NULL); +} +#else +static void fadvise_call_mce_injector_chain(unsigned long physaddr) +{ +} +#endif + +static int fadvise_inject_mce(struct file *file, struct address_space *mapping, + loff_t offset) +{ + unsigned long page_mask, physaddr; + struct folio *folio; + pgoff_t index; + + if (!IS_ENABLED(CONFIG_X86_MCE_INJECT)) + return -EOPNOTSUPP; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (is_file_hugepages(file)) { + index = offset >> huge_page_shift(hstate_file(file)); + page_mask = huge_page_mask(hstate_file(file)); + } else { + index = offset >> PAGE_SHIFT; + page_mask = PAGE_MASK; + } + + folio = filemap_get_folio(mapping, index); + if (IS_ERR(folio)) + return PTR_ERR(folio); + + physaddr = (folio_pfn(folio) << PAGE_SHIFT) | (offset & ~page_mask); + folio_put(folio); + + pr_info("Injecting mce for address %#lx at file offset %#llx\n", + physaddr, offset); + fadvise_call_mce_injector_chain(physaddr); + return 0; +} + /* * POSIX_FADV_WILLNEED could set PG_Referenced, and POSIX_FADV_NOREUSE could * deactivate the pages and clear PG_Referenced. @@ -111,6 +170,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice) return 0; case FADV_HWPOISON: case FADV_SOFT_OFFLINE: + case FADV_MCE_INJECT: break; default: return -EINVAL; @@ -226,6 +286,8 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice) case FADV_HWPOISON: case FADV_SOFT_OFFLINE: return fadvise_inject_error(file, mapping, offset, endbyte, advice); + case FADV_MCE_INJECT: + return fadvise_inject_mce(file, mapping, offset); default: return -EINVAL; } From patchwork Tue Oct 10 08:35:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13415017 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC7FFCD6907 for ; Tue, 10 Oct 2023 08:36:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442977AbjJJIgC (ORCPT ); Tue, 10 Oct 2023 04:36:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43794 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442921AbjJJIfk (ORCPT ); Tue, 10 Oct 2023 04:35:40 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51691B6; Tue, 10 Oct 2023 01:35:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696926936; x=1728462936; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ltSCu5LbcZQkZ8VDt/3RcMj+M7LqTSv2OY7y3BW7maY=; b=ZpEJeiiZhtjBxR1u8J/utaagfBKDuvkDbH/JKu3uwKNDCmRFTSia2G86 OUEOCnUU5DZbD+/l7R/5rOJJL+PaUX8xk4MNK1trPsv9oL8XsBbqqIHlW hI1HcbSYAdgfk8hfUhsgKsPOKOponaOX+zaVPKJxLED6WT+V0+frtD5Oa JDp1bajZrg4ffHJxVlY+TC0YbKHr9TDlg4TZszvFcHFgDzau+iBns2JOO U7bNiJDi9JahWusrxBHhKlXbEEopQBw2g/hjSY1F5aJ2Kth4k2iMtUcL0 UUVPooM+4x78FA5Ibj+04dw8xLg6sMMhnwFfGyyfKZPkn+BjFColkCytO w==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="363689820" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="363689820" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="1084687197" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="1084687197" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:34 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , linux-coco@lists.linux.dev, Chao Peng Subject: [PATCH 07/12] x86/mce/inject: Wire up the x86 MCE injector to FADV_MCE_INJECT Date: Tue, 10 Oct 2023 01:35:15 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata To inject machine check based on FADV_MCE_INJECT, wire up the x86 MCE injector to register its handler to get notified physical address on FADV_MCE_INJECT. Signed-off-by: Isaku Yamahata --- arch/x86/kernel/cpu/mce/inject.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c index ae3efbeb78bd..43d896a89648 100644 --- a/arch/x86/kernel/cpu/mce/inject.c +++ b/arch/x86/kernel/cpu/mce/inject.c @@ -294,6 +294,17 @@ static struct notifier_block inject_nb = { .notifier_call = mce_inject_raise, }; +static int fadvise_inject_addr(struct notifier_block *nb, unsigned long val, + void *data) +{ + inj_addr_set(&i_mce, val); + return NOTIFY_DONE; +} + +static struct notifier_block fadvise_nb = { + .notifier_call = fadvise_inject_addr, +} + /* * Caller needs to be make sure this cpu doesn't disappear * from under us, i.e.: get_cpu/put_cpu. @@ -784,6 +795,7 @@ static int __init inject_init(void) register_nmi_handler(NMI_LOCAL, mce_raise_notify, 0, "mce_notify"); mce_register_injector_chain(&inject_nb); + fadvise_register_mce_injector_chain(&fadvise_nb); setup_inj_struct(&i_mce); @@ -795,6 +807,7 @@ static int __init inject_init(void) static void __exit inject_exit(void) { + fadvise_unregister_mce_injector_chain(&fadvise_nb); mce_unregister_injector_chain(&inject_nb); unregister_nmi_handler(NMI_LOCAL, "mce_notify"); From patchwork Tue Oct 10 08:35:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13415016 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82BE9CD68FE for ; Tue, 10 Oct 2023 08:36:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442962AbjJJIf5 (ORCPT ); Tue, 10 Oct 2023 04:35:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442916AbjJJIfk (ORCPT ); Tue, 10 Oct 2023 04:35:40 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4631EA4; Tue, 10 Oct 2023 01:35:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696926937; x=1728462937; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TulaiyUaTkWindo/LLjGvgcL1lDIYfNLfoQuJ8JmFgY=; b=O5d76+aeMKzgP6UP84leojsKUaAxAutmpKIo61UnHfRwwyXnFwvAuvbZ WywDWBtI9/hrMzAAQXQrK8HJfyCnMdWNjJWSvxObVvG6Nzhli5Ixz0LEJ E8HOFxvr/NaYUSBfBLbNPDW4s7ZXeZ53A3KBOJ7d9FNq5ZSd+lPkY4e/L wgiRYPXRLIFKXHG1ymlOuDdjmVmx58zlqrurByQJqP33vZ1qoldysnc03 EB5zDu38ZJEpxtsO6lPJVQlWi8Z+h5ElQE9ff3XRpu7MFOTCJagMfN04m ZmFVoyf/brC4doETi3az7BHAllpZKamLeRkHM3ka2nAkHG6oCrJdQmYp1 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="363689830" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="363689830" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="1084687200" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="1084687200" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:35 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , linux-coco@lists.linux.dev, Chao Peng Subject: [PATCH 08/12] x86/mce: Define a notifier chain for mce injector Date: Tue, 10 Oct 2023 01:35:16 -0700 Message-Id: <0fd1649eb24b9757c299a89e34444630924bd2e0.1696926843.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata To test KVM mechine check injection path, KVM wants to trigger machine check on its own. Not on writing to debugfs bank file of x86 mce injector. Because KVM doesn't want to depend on x86 MCE injector, define notifier chain in the x86 MCE core and make the x86 MCE injector to register the notifier chain to decouple the MCE injector and its user. This follows how dev-mcelog does. Signed-off-by: Isaku Yamahata --- arch/x86/include/asm/mce.h | 6 ++++++ arch/x86/kernel/cpu/mce/core.c | 20 ++++++++++++++++++++ arch/x86/kernel/cpu/mce/inject.c | 18 ++++++++++++++++++ 3 files changed, 44 insertions(+) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 459066ecd922..1a832a207b6a 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -269,10 +269,16 @@ DECLARE_PER_CPU(struct mce, injectm); void mce_inject_lock(void); void mce_inject_unlock(void); void mce_inject(struct mce *m); +void mce_register_atomic_injector_chain(struct notifier_block *nb); +void mce_unregister_atomic_injector_chain(struct notifier_block *nb); +void mce_call_atomic_injector_chain(unsigned long cpu); #else static inline void mce_inject_lock(void) {} static inline void mce_inject_unlock(void) {} static inline void mce_inject(struct mce *m) {} +static inline void mce_register_atomic_injector_chain(struct notifier_block *nb) {} +static inline void mce_unregister_atomic_injector_chain(struct notifier_block *nb) {} +static inline void mce_call_atomic_injector_chain(unsigned long cpu) {} #endif /* Disable CMCI/polling for MCA bank claimed by firmware */ diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 6929c3cad278..9a6b04be7404 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -169,6 +169,26 @@ void mce_inject(struct mce *m) } EXPORT_SYMBOL_GPL(mce_inject); +static ATOMIC_NOTIFIER_HEAD(mce_atomic_injector_chain); + +void mce_register_atomic_injector_chain(struct notifier_block *nb) +{ + atomic_notifier_chain_register(&mce_atomic_injector_chain, nb); +} +EXPORT_SYMBOL_GPL(mce_register_atomic_injector_chain); + +void mce_unregister_atomic_injector_chain(struct notifier_block *nb) +{ + atomic_notifier_chain_unregister(&mce_atomic_injector_chain, nb); +} +EXPORT_SYMBOL_GPL(mce_unregister_atomic_injector_chain); + +void mce_call_atomic_injector_chain(unsigned long cpu) +{ + atomic_notifier_call_chain(&mce_atomic_injector_chain, cpu, NULL); +} +EXPORT_SYMBOL_GPL(mce_call_atomic_injector_chain); + void mce_log(struct mce *m) { if (!mce_gen_pool_add(m)) diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c index 43d896a89648..5dac4dcb25aa 100644 --- a/arch/x86/kernel/cpu/mce/inject.c +++ b/arch/x86/kernel/cpu/mce/inject.c @@ -303,8 +303,24 @@ static int fadvise_inject_addr(struct notifier_block *nb, unsigned long val, static struct notifier_block fadvise_nb = { .notifier_call = fadvise_inject_addr, +}; + +static int mce_atomic_injector(struct notifier_block *nb, unsigned long val, + void *data) +{ + lockdep_assert_preemption_disabled(); + + i_mce.extcpu = val; + mce_inject(&i_mce); + setup_inj_struct(&i_mce); + + return NOTIFY_DONE; } +static struct notifier_block atomic_injector_nb = { + .notifier_call = mce_atomic_injector, +}; + /* * Caller needs to be make sure this cpu doesn't disappear * from under us, i.e.: get_cpu/put_cpu. @@ -796,6 +812,7 @@ static int __init inject_init(void) register_nmi_handler(NMI_LOCAL, mce_raise_notify, 0, "mce_notify"); mce_register_injector_chain(&inject_nb); fadvise_register_mce_injector_chain(&fadvise_nb); + mce_register_atomic_injector_chain(&atomic_injector_nb); setup_inj_struct(&i_mce); @@ -807,6 +824,7 @@ static int __init inject_init(void) static void __exit inject_exit(void) { + mce_unregister_atomic_injector_chain(&atomic_injector_nb); fadvise_unregister_mce_injector_chain(&fadvise_nb); mce_unregister_injector_chain(&inject_nb); unregister_nmi_handler(NMI_LOCAL, "mce_notify"); From patchwork Tue Oct 10 08:35:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13415019 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5335ECD690A for ; Tue, 10 Oct 2023 08:36:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442937AbjJJIgJ (ORCPT ); Tue, 10 Oct 2023 04:36:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43826 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442920AbjJJIfk (ORCPT ); Tue, 10 Oct 2023 04:35:40 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F868A9; Tue, 10 Oct 2023 01:35:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696926937; x=1728462937; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8vd+meAVtjSRzVgDs7zTq2sHPQpNFGGV0ODScm1t1Vs=; b=jOep2cxSlCfUZJ8I8yRb5GqKeoLQ8S+dEodBghJaZTSA0I6/dKb9XBgV sSFFk8Z0qJ4sgnJGuOQpu0THcEQZz4Nv6qUw/qD9uXu6kYBobw1yqzAh0 8vnQFQfuDPHwMt7rDitGD0On6MiNTVVZhUcUNf5iz+ZtprG4m6DjesRpb GM9FSPEZrcA4DUW27Fm1eh1N4KUsOVZMFUhqCs1giYDJvw6aJO3APjq08 AXd+HXndIv9u5jVJm+eoTbaDBZWxchDW6nmFy/vsUJi5a6kLbWmIRfLr2 DvDMhesyEh5owbDILiyXygA5SDvp/YlaNn36hD5jRpb9opMRkfxKtDN0u w==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="363689836" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="363689836" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="1084687203" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="1084687203" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:35 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , linux-coco@lists.linux.dev, Chao Peng Subject: [PATCH 09/12] KVM: X86: Add debugfs to inject machine check on VM exit Date: Tue, 10 Oct 2023 01:35:17 -0700 Message-Id: <47c8a177d81762e0561fc47cc274076de901edbf.1696926843.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata The KVM/x86 handles machine-check in the guest specially. It sets up the guest so that vcpu exits from running guests, checks the exit reason and, manually raises the machine check by calling do_machine_check(). To test the KVM machine check execution path, KVM wants to inject the machine check in the context of vcpu instead of the context of the process of MCE injection. Wire up the MCE injection framework for KVM to trigger MCE in the vcpu context. Add a kvm vcpu debugfs entry for an operator to tell KVM to inject MCE. The operation flow is as follows: - Set notrigger to 1 to tell the x86 MCE injector to suppress it from injecting machine check. echo 1 > /sys/kernel/debug/mce-inject/notrigger - Set MCE parameters via x86 MCE injector debugfs /sys/kernel/debug/mce-inject/{addr, bank, flags, mcgstatus, misc, status} - Tell KVM to inject MCE echo 1 > /sys/kernel/debug/kvm/-/vcpu/mce-inject Signed-off-by: Isaku Yamahata --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/debugfs.c | 22 ++++++++++++++++++++++ arch/x86/kvm/x86.c | 14 ++++++++++++++ 3 files changed, 37 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 17715cb8731d..9286f3d02f30 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -113,6 +113,7 @@ KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_HV_TLB_FLUSH \ KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) +#define KVM_REQ_MCE_INJECT KVM_ARCH_REQ(33) #define CR0_RESERVED_BITS \ (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c index ee8c4c3496ed..fee208f30400 100644 --- a/arch/x86/kvm/debugfs.c +++ b/arch/x86/kvm/debugfs.c @@ -56,6 +56,22 @@ static int vcpu_get_tsc_scaling_frac_bits(void *data, u64 *val) DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_scaling_frac_fops, vcpu_get_tsc_scaling_frac_bits, NULL, "%llu\n"); +static int vcpu_mce_inject_set(void *data, u64 val) +{ + struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (val != 1) + return -EINVAL; + kvm_make_request(KVM_REQ_MCE_INJECT, vcpu); + kvm_vcpu_kick(vcpu); + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(vcpu_mce_inject_fops, NULL, vcpu_mce_inject_set, "%llx\n"); + void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry) { debugfs_create_file("guest_mode", 0444, debugfs_dentry, vcpu, @@ -76,6 +92,12 @@ void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_ debugfs_dentry, vcpu, &vcpu_tsc_scaling_frac_fops); } + + if (IS_ENABLED(CONFIG_X86_MCE_INJECT) && + boot_cpu_has(X86_FEATURE_MCE) && boot_cpu_has(X86_FEATURE_MCA)) + debugfs_create_file("mce-inject", 0200, + debugfs_dentry, vcpu, + &vcpu_mce_inject_fops); } /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9f18b06bbda6..e4c63ded4c9a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10496,6 +10496,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) fastpath_t exit_fastpath; bool req_immediate_exit = false; + bool req_mce_inject = false; if (kvm_request_pending(vcpu)) { if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) { @@ -10642,6 +10643,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu)) static_call(kvm_x86_update_cpu_dirty_logging)(vcpu); + + req_mce_inject = kvm_check_request(KVM_REQ_MCE_INJECT, vcpu); } if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win || @@ -10676,6 +10679,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) goto cancel_injection; } + if (unlikely(req_mce_inject)) + mce_inject_lock(); preempt_disable(); static_call(kvm_x86_prepare_switch_to_guest)(vcpu); @@ -10721,6 +10726,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) smp_wmb(); local_irq_enable(); preempt_enable(); + if (unlikely(req_mce_inject)) { + kvm_make_request(KVM_REQ_MCE_INJECT, vcpu); + mce_inject_unlock(); + } kvm_vcpu_srcu_read_lock(vcpu); r = 1; goto cancel_injection; @@ -10814,6 +10823,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) fpu_sync_guest_vmexit_xfd_state(); static_call(kvm_x86_handle_exit_irqoff)(vcpu); + if (unlikely(req_mce_inject)) { + mce_call_atomic_injector_chain(smp_processor_id()); + kvm_machine_check(); + mce_inject_unlock(); + } if (vcpu->arch.guest_fpu.xfd_err) wrmsrl(MSR_IA32_XFD_ERR, 0); From patchwork Tue Oct 10 08:35:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13415020 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1592CD6907 for ; Tue, 10 Oct 2023 08:36:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1443027AbjJJIgM (ORCPT ); Tue, 10 Oct 2023 04:36:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442929AbjJJIfl (ORCPT ); Tue, 10 Oct 2023 04:35:41 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB9EBB0; Tue, 10 Oct 2023 01:35:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696926939; x=1728462939; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GQCMPF9yOreHoZSYiAwwtuGF3KN8BZQ7hfFsQ7RV6BI=; b=bXtY4xmIXZHQOZCQV0x2TwrkFm/HEB+plRFOae2y9eARWcxiVOO0SfaR y/cYrF9IniF9MAlZCC5JXVjdVLDTIXz5D1Eczc9RoO40AtJPmBKmSIA7w m0wyW6acZta3qV5+gCM0Ycg1K3AmqTezHn4effwEmr2JCEuuE+Rixqsel hLnyBUEXyo5OCGI0dHMHMVEpAw2D5uMHMRZWLKu9eNDEtzom++6R4oFDz zTAsUQxDp6mSnl5L+AFZRnREXpWKNGDJorcEmjh9JFEmSXxrnJSWbG+FD NRIQT4Nb+8hyhQh1CWpMlOHXoB+mZa94wR7JUUNajY1YA64EFv7N+KYNz w==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="363689842" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="363689842" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="1084687206" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="1084687206" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:35 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , linux-coco@lists.linux.dev, Chao Peng Subject: [PATCH 10/12] KVM: selftests: Allow mapping guest memory without host alias Date: Tue, 10 Oct 2023 01:35:18 -0700 Message-Id: <9e970731a61711acd38b1819ab5d2eaf326e0f0f.1696926843.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata For test the KVM hwpoison of memory, the recovery path unmaps the virtual memory and send SIGBUS with its address. If the poisoned page is mapped at twice or more in the virtual memory, the address in SIGBUS siginfo isn't deterministic. To make the hwpoison test cases deterministic, allow to avoid guest memory alias. Signed-off-by: Isaku Yamahata --- .../selftests/kvm/include/kvm_util_base.h | 4 ++++ tools/testing/selftests/kvm/lib/kvm_util.c | 23 ++++++++++++++----- 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index a18db6a7b3cf..e980776a2c94 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -435,6 +435,10 @@ void vm_set_user_memory_region(struct kvm_vm *vm, uint32_t slot, uint32_t flags, uint64_t gpa, uint64_t size, void *hva); int __vm_set_user_memory_region(struct kvm_vm *vm, uint32_t slot, uint32_t flags, uint64_t gpa, uint64_t size, void *hva); +void __vm_userspace_mem_region_add(struct kvm_vm *vm, + enum vm_mem_backing_src_type src_type, + uint64_t guest_paddr, uint32_t slot, uint64_t npages, + uint32_t flags, bool alias); void vm_userspace_mem_region_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, uint64_t guest_paddr, uint32_t slot, uint64_t npages, diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 7a8af1821f5d..310c3a760cb8 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -691,12 +691,14 @@ static void __vm_mem_region_delete(struct kvm_vm *vm, sparsebit_free(®ion->unused_phy_pages); ret = munmap(region->mmap_start, region->mmap_size); TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret)); - if (region->fd >= 0) { - /* There's an extra map when using shared memory. */ + + /* There's an extra map when using shared memory. */ + if (region->mmap_alias) { ret = munmap(region->mmap_alias, region->mmap_size); TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret)); - close(region->fd); } + if (region->fd >= 0) + close(region->fd); free(region); } @@ -920,10 +922,10 @@ void vm_set_user_memory_region(struct kvm_vm *vm, uint32_t slot, uint32_t flags, * given by slot, which must be unique and < KVM_MEM_SLOTS_NUM. The * region is created with the flags given by flags. */ -void vm_userspace_mem_region_add(struct kvm_vm *vm, +void __vm_userspace_mem_region_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, uint64_t guest_paddr, uint32_t slot, uint64_t npages, - uint32_t flags) + uint32_t flags, bool create_alias) { int ret; struct userspace_mem_region *region; @@ -1057,7 +1059,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, hash_add(vm->regions.slot_hash, ®ion->slot_node, slot); /* If shared memory, create an alias. */ - if (region->fd >= 0) { + if (region->fd >= 0 && create_alias) { region->mmap_alias = mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE, vm_mem_backing_src_alias(src_type)->flag, @@ -1070,6 +1072,15 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, } } +void vm_userspace_mem_region_add(struct kvm_vm *vm, + enum vm_mem_backing_src_type src_type, + uint64_t guest_paddr, uint32_t slot, uint64_t npages, + uint32_t flags) +{ + __vm_userspace_mem_region_add(vm, src_type, guest_paddr, slot, npages, + flags, true); +} + /* * Memslot to region * From patchwork Tue Oct 10 08:35:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13415018 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F791CD6907 for ; Tue, 10 Oct 2023 08:36:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442978AbjJJIgG (ORCPT ); Tue, 10 Oct 2023 04:36:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442928AbjJJIfl (ORCPT ); Tue, 10 Oct 2023 04:35:41 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B189EB4; Tue, 10 Oct 2023 01:35:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696926939; x=1728462939; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=o5MD2FPMw4tHBthOkn4CAG942xWE4Di7Fgiy08pHbDg=; b=UIFpXu14YFLCr6Q2IJYfye6b2fpkjRbP3hH67zkmd4V0qy99cwH33bh/ J96S00sLPbeEn8WKym2vCKE6Jy01bFK0TkaHPhSslHMU47qIO/ka3ieWQ ug6KDXT2EzY6iCxq3Ouv2El3F4AnjLKtMxlUIFeFF9qG5yQ8Y/HSm8YY6 uwLDjOhG4X310sQlO/NZDzrWWtOdaC6rk8h7Rlg3DDfdu99OAJmOf+dl/ EaoJWqRuIn1zHARkEwZSlgs4Jsks56Z3ncycGV1Kp3UzN81SelplKHdg1 EjEfvu73K0HlQBY/TXGiMy1fEwIK7e00Mo/HJrIaj/dxoXj1zn35rgf5o Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="363689848" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="363689848" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="1084687210" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="1084687210" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:36 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , linux-coco@lists.linux.dev, Chao Peng Subject: [PATCH 11/12] KVM: selftests: lib: Add src memory type for hwpoison test Date: Tue, 10 Oct 2023 01:35:19 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata For hwpoison test to recover hwpoisoned memory, use madvise(MADV_FREE) that requires private mapping. Add a new src memory type for it. Signed-off-by: Isaku Yamahata --- tools/testing/selftests/kvm/include/test_util.h | 2 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 7 +++++-- tools/testing/selftests/kvm/lib/test_util.c | 8 ++++++++ 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index 7e614adc6cf4..c97af4ff0699 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -111,6 +111,8 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB, VM_MEM_SRC_SHMEM, VM_MEM_SRC_SHARED_HUGETLB, + VM_MEM_SRC_ANONYMOUS_MEMFD, + VM_MEM_SRC_ANONYMOUS_MEMFD_HUGETLB, NUM_SRC_TYPES, }; diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 310c3a760cb8..95ffb3b97dcf 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -1007,9 +1007,12 @@ void __vm_userspace_mem_region_add(struct kvm_vm *vm, region->mmap_size += alignment; region->fd = -1; - if (backing_src_is_shared(src_type)) + if (backing_src_is_shared(src_type) || + src_type == VM_MEM_SRC_ANONYMOUS_MEMFD || + src_type == VM_MEM_SRC_ANONYMOUS_MEMFD_HUGETLB) region->fd = kvm_memfd_alloc(region->mmap_size, - src_type == VM_MEM_SRC_SHARED_HUGETLB); + src_type == VM_MEM_SRC_SHARED_HUGETLB || + src_type == VM_MEM_SRC_ANONYMOUS_MEMFD_HUGETLB); region->mmap_start = mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE, diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index 5d7f28b02d73..a09bba5c074e 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -281,6 +281,14 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) */ .flag = MAP_SHARED, }, + [VM_MEM_SRC_ANONYMOUS_MEMFD] = { + .name = "anonymous_memfd", + .flag = ANON_FLAGS, + }, + [VM_MEM_SRC_ANONYMOUS_MEMFD_HUGETLB] = { + .name = "anonymous_memfd_hugetlb", + .flag = ANON_HUGE_FLAGS, + }, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES, "Missing new backing src types?"); From patchwork Tue Oct 10 08:35:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13415021 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9A26CD6907 for ; Tue, 10 Oct 2023 08:36:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1443032AbjJJIgO (ORCPT ); Tue, 10 Oct 2023 04:36:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442932AbjJJIfn (ORCPT ); Tue, 10 Oct 2023 04:35:43 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1B0DB7; Tue, 10 Oct 2023 01:35:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696926939; x=1728462939; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2tZvsMcEGJfe9Tdw0E0f23p4isbP+51/a3sxhYiJi84=; b=euTzzIlpVB0bX2eK9u/U09eu57dubTtiQBXDihFx3buUStXszDW5svKU BH50V+T7uyFXTJjyOnfcFTbWEpbJCL18Hwp0+3cldSKLusSRAx1I5bhaq WKDVYa7a2tZ7njyJQYW6Dn39JwH9Og+9e3LrSHL+CJ+YjDQLafLnnRMMS c6isBboNw3tpO+20JdLQqz67+T9GyegiQ0M1CgF2zrhFDnMt5GLdPIvyK McDpF1wS8InjqCvooQVcksfEuqCADHf4g54qyG30UHlqkIK5oAdM0yEPI hNjVtQjJX0JLNsGZ4BpDcwG9yVsY6ntwAK3EC06q8arGPZOMEeEx8r48P g==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="363689854" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="363689854" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="1084687213" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="1084687213" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:35:36 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , linux-coco@lists.linux.dev, Chao Peng Subject: [PATCH 12/12] KVM: selftests: hwpoison/mce failure injection Date: Tue, 10 Oct 2023 01:35:20 -0700 Message-Id: <4bdb845fd5ed073fdeeb521ed0a1843bcb6ebd05.1696926843.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata Add test cases for KVM to inject hwpoison or x86 MCE. It exercises FADV_HWPOISON and KVM/x86 machine check injection. Signed-off-by: Isaku Yamahata --- tools/testing/selftests/kvm/Makefile | 1 + .../testing/selftests/kvm/mem_hwpoison_test.c | 721 ++++++++++++++++++ 2 files changed, 722 insertions(+) create mode 100644 tools/testing/selftests/kvm/mem_hwpoison_test.c diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index a3bb36fb3cfc..ec9b4c6f1e75 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -128,6 +128,7 @@ TEST_GEN_PROGS_x86_64 += hardware_disable_test TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus TEST_GEN_PROGS_x86_64 += kvm_page_table_test TEST_GEN_PROGS_x86_64 += max_guest_memory_test +TEST_GEN_PROGS_x86_64 += mem_hwpoison_test TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test TEST_GEN_PROGS_x86_64 += memslot_perf_test TEST_GEN_PROGS_x86_64 += rseq_test diff --git a/tools/testing/selftests/kvm/mem_hwpoison_test.c b/tools/testing/selftests/kvm/mem_hwpoison_test.c new file mode 100644 index 000000000000..cc4cdc6420fe --- /dev/null +++ b/tools/testing/selftests/kvm/mem_hwpoison_test.c @@ -0,0 +1,721 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022, Google LLC. + * Copyright (C) 2023, Intel Corp. + */ +#define _GNU_SOURCE /* for program_invocation_short_name */ + +#include +#include +#include + +#include +#include + +#include + +#define DATA_SLOT 10 +#define DATA_GPA ((uint64_t)(1ull << 32)) +#define DATA_SIZE ((uint64_t)(SZ_2M)) + +enum ucall_syncs { + /* Give kvm a chance to inject mce */ + GUEST_NOP, + GUEST_HWPOISON, +}; + +static void guest_sync_nop(void) +{ + ucall(UCALL_SYNC, 1, GUEST_NOP); +} + +static void guest_sync_hwpoison(uint64_t gpa) +{ + ucall(UCALL_SYNC, 2, GUEST_HWPOISON, gpa); +} + +static void memcmp_g(const char *file, unsigned int line, uint64_t gpa, + uint8_t pattern, uint64_t size) +{ + uint8_t *mem = (uint8_t *)gpa; + size_t i; + + for (i = 0; i < size; i++) { + if (mem[i] == pattern) + continue; + + ucall_assert(UCALL_ABORT, "mem[i] == pattern", + file, line, + "Expected 0x%x at offset %lu (gpa 0x%lx), got 0x%x", + pattern, i, gpa + i, mem[i]); + } +} +#define MEMCMP_G(gpa, pattern, size) \ + memcmp_g(__FILE__, __LINE__, (gpa), (pattern), (size)) + +static const uint64_t test_offsets[] = { + 0, + PAGE_SIZE, +}; + +static void guest_code(uint64_t gpa, bool huge_page, bool do_wait) +{ + const uint8_t init_p = 0xcc; + uint64_t r; + int i; + + for (i = 0; i < ARRAY_SIZE(test_offsets); i++) { + uint64_t offset = test_offsets[i]; + uint64_t base = gpa + offset; + + /* + * Set the test region to non-zero to differentiate it from the + * page newly assigned. + */ + memset((void *)gpa, init_p, SZ_2M); + if (do_wait) + /* Hold mce injector. */ + WRITE_ONCE(*(uint8_t *)gpa, 0); + + /* Ask VMM to poison the page. */ + guest_sync_hwpoison(base); + + if (do_wait) { + /* Allow mce injector to continue. */ + WRITE_ONCE(*(uint8_t *)gpa, init_p); + + /* Wait for poisoned page zeroed. */ + while (READ_ONCE(*(uint8_t *)base) == init_p) + ; + } + + /* When injecting mce, give KVM a chance to inject it. */ + guest_sync_nop(); + + /* Consume poisoned data. */ + r = READ_ONCE(*(uint64_t *)base); + /* VMM discarded the poisoned page and assign a new page. */ + GUEST_ASSERT_EQ(r, 0); + + /* Check if the page is zeroed or it keeps its contents. */ + if (huge_page) { + MEMCMP_G(gpa, 0, SZ_2M); + } else { + if (offset > 0) + MEMCMP_G(gpa, init_p, offset); + MEMCMP_G(base, 0, PAGE_SIZE); + if (offset + PAGE_SIZE < SZ_2M) + MEMCMP_G(gpa + offset + PAGE_SIZE, init_p, + SZ_2M - (offset + PAGE_SIZE)); + } + } + + GUEST_DONE(); +} + +#define MCE_INJECT_DIR "/sys/kernel/debug/mce-inject/" +#define MCE_STATUS MCE_INJECT_DIR"status" +#define MCE_MISC MCE_INJECT_DIR"misc" +#define MCE_BANK MCE_INJECT_DIR"bank" +#define MCE_FLAGS MCE_INJECT_DIR"flags" +#define MCE_MCGSTATUS MCE_INJECT_DIR"mcgstatus" +#define MCE_NOTRIGGER MCE_INJECT_DIR"notrigger" + +static void write_buf(const char *path, const char *buf, size_t size) +{ + ssize_t r; + int fd; + + fd = open(path, O_WRONLY); + TEST_ASSERT(fd >= 0, "failed to open %s\n", path); + + r = write(fd, buf, size); + TEST_ASSERT(r == size, "failed to write(%s:%s:%zd) = %zd\n", + path, buf, size, r); + + close(fd); +} + +static void mce_write(const char *path, uint64_t val) +{ + char buf[64]; + int len; + + len = sprintf(buf, "%"PRIu64"\n", val); + write_buf(path, buf, len); +} + +static void mce_flags(void) +{ + char *buf = "sw\n"; + + write_buf(MCE_FLAGS, buf, strlen(buf)); +} + +/* From asm/mce.h */ +/* MCG_STATUS register defines */ +#define MCG_STATUS_EIPV BIT_ULL(1) /* ip points to correct instruction */ +#define MCG_STATUS_MCIP BIT_ULL(2) /* machine check in progress */ +#define MCG_STATUS_LMCES BIT_ULL(3) /* LMCE signaled */ + +/* MCi_STATUS register defines */ +#define MCI_STATUS_VAL BIT_ULL(63) /* valid error */ +#define MCI_STATUS_UC BIT_ULL(61) /* uncorrected error */ +#define MCI_STATUS_EN BIT_ULL(60) /* error enabled */ +#define MCI_STATUS_MISCV BIT_ULL(59) /* misc error reg. valid */ +#define MCI_STATUS_ADDRV BIT_ULL(58) /* addr reg. valid */ +#define MCI_STATUS_AR BIT_ULL(55) /* Action required */ +#define MCI_STATUS_S BIT_ULL(56) /* Signaled machine check */ + +#define MCACOD_DATA 0x0134 /* Data Load */ + +/* MCi_MISC register defines */ +#define MCI_MISC_ADDR_MODE_SHIFT 6 +#define MCI_MISC_ADDR_PHYS 2 + +#define KVM_MCE_INJECT "/sys/kernel/debug/kvm/%d-%d/vcpu%d/mce-inject" + +/* Worst case buffer size needed for holding an integer. */ +#define ITOA_MAX_LEN 12 + +static void vcpu_inject_mce(struct kvm_vcpu *vcpu) +{ + char path[sizeof(KVM_MCE_INJECT) + ITOA_MAX_LEN * 3 + 1]; + char data[] = "0"; + ssize_t r; + int fd; + + snprintf(path, sizeof(path), KVM_MCE_INJECT, + getpid(), vcpu->vm->fd, vcpu->id); + + fd = open(path, O_WRONLY); + TEST_ASSERT(fd >= 0, "failed to open %s\n", path); + + data[0] = '0'; + r = write(fd, data, sizeof(data)); + TEST_ASSERT(r == -1 && errno == EINVAL, + "succeeded to write(%s:%s:%zd) = %zd\n", + path, data, sizeof(data), r); + + data[0] = '1'; + r = write(fd, data, sizeof(data)); + TEST_ASSERT(r == sizeof(data), + "failed to write(%s:%s:%zd) = %zd\n", + path, data, sizeof(data), r); + + close(fd); +} + +static void inject_mce(struct kvm_vcpu *vcpu, int memfd, uint64_t gpa) +{ + /* See vm_mem_add() in test_mem_failure() */ + uint64_t offset = gpa - DATA_GPA; + int ret; + + /* mce-inject in debugfs triggers mce injection. */ + mce_write(MCE_NOTRIGGER, 1); + + /* FIXME: These values are vendor specific. */ + mce_write(MCE_MCGSTATUS, + MCG_STATUS_EIPV | MCG_STATUS_MCIP | MCG_STATUS_LMCES); + mce_write(MCE_MISC, + (MCI_MISC_ADDR_PHYS << MCI_MISC_ADDR_MODE_SHIFT) | 3); + /* + * MCI_STATUS_UC: Uncorrected error: + * MCI_STATUS_EN | MCI_STATUS_AR | MCI_STATUS_S: + * SRAR: Software Recoverable Action Required + */ + mce_write(MCE_STATUS, + MCACOD_DATA | + MCI_STATUS_EN | MCI_STATUS_UC | MCI_STATUS_S | MCI_STATUS_AR | + MCI_STATUS_VAL | MCI_STATUS_MISCV | MCI_STATUS_ADDRV); + mce_flags(); + mce_write(MCE_BANK, 0); + + /* Set physical address to MCE_ADDR. */ + ret = posix_fadvise(memfd, offset, sizeof(u64), FADV_MCE_INJECT); + /* posix_fadvise() doesn't set errno, but returns erorr no. */ + if (ret) + errno = ret; + __TEST_REQUIRE(ret != EPERM, + "Injecting mcet requires CAP_SYS_ADMIN ret %d", ret); + TEST_ASSERT(!ret || ret == EBUSY, + "posix_fadvise(FADV_MCE_INJECT) should success ret %d", ret); + + /* Schedule to fire MCE on the next vcpu run. */ + vcpu_inject_mce(vcpu); +} + +static sigjmp_buf sigbuf; +static volatile void *fault_addr; + +static void sigbus_handler(int sig, siginfo_t *info, void *data) +{ + int lsb = info->si_addr_lsb; + + TEST_ASSERT(info->si_signo == SIGBUS, + "Unknown signal number expected %d received %d\n", + SIGBUS, info->si_signo); + TEST_ASSERT(info->si_code == BUS_MCEERR_AR, + "Unknown signal code received expected %d code %d\n", + BUS_MCEERR_AR, info->si_code); + TEST_ASSERT((info->si_addr_lsb == PAGE_SHIFT || + info->si_addr_lsb == HUGEPAGE_SHIFT(2)), + "Unknown signal addr_lsb expected lsb %d or %d received %d\n", + PAGE_SHIFT, HUGEPAGE_SHIFT(2), info->si_addr_lsb); + TEST_ASSERT(((intptr_t)info->si_addr >> lsb) == ((intptr_t)fault_addr >> lsb), + "Unknown signal addr expected %p received %p lsb %d\n", + fault_addr, info->si_addr, lsb); + + fault_addr = NULL; + siglongjmp(sigbuf, 1); +} + +static void sigbus_handler_set(bool set) +{ + struct sigaction sa; + int r; + + if (set) + sa = (struct sigaction) { + .sa_sigaction = sigbus_handler, + .sa_flags = SA_SIGINFO, + }; + else + sa = (struct sigaction) { + .sa_handler = SIG_DFL, + }; + + r = sigaction(SIGBUS, &sa, NULL); + TEST_ASSERT(!r, "sigaction should success"); +} + +static void discard_nop(struct kvm_vcpu *vcpu) +{ + struct kvm_run *run = vcpu->run; + struct ucall uc; + uint64_t cmd; + + vcpu_run(vcpu); + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, + "Wanted KVM_EXIT_IO, got exit reason: %u (%s)", + run->exit_reason, exit_reason_str(run->exit_reason)); + cmd = get_ucall(vcpu, &uc); + TEST_ASSERT(cmd == UCALL_SYNC, "UCALL_SYNC is expected %lu", cmd); + TEST_ASSERT(uc.args[0] == GUEST_NOP, "GUEST_NOP is expected %lu", + uc.args[0]); +} + +static void test_madvise(struct kvm_vcpu *vcpu, int memfd, bool huge_page, + uint64_t gpa) +{ + uint8_t *hva = addr_gpa2hva(vcpu->vm, gpa); + int r; + + discard_nop(vcpu); + r = madvise(hva, sizeof(u64), MADV_HWPOISON); + __TEST_REQUIRE(!(r == -1 && errno == EPERM), + "madvise(MADV_HWPOISON) requires CAP_SYS_ADMIN"); + TEST_ASSERT(!r, "madvise(MADV_HWPOISON) should succeed"); + + if (!sigsetjmp(sigbuf, 1)) { + sigbus_handler_set(true); + + /* Trigger SIGBUS */ + fault_addr = hva; + vcpu_run(vcpu); + TEST_FAIL("SIGBUS isn't triggered"); + } + + /* + * Discard poisoned page and add a new page so that guest vcpu + * can continue. + */ + if (huge_page) { + void *v; + + /* + * madvise(MADV_FREE) doesn't work for huge page. + * Resort to munmap() and mmap(). + */ + gpa = align_down(gpa, SZ_2M); + hva = addr_gpa2hva(vcpu->vm, gpa); + + r = munmap(hva, SZ_2M); + TEST_ASSERT(!r, "munmap() should succeed"); + + /* Map it again with new page for guest to continue. */ + v = mmap(hva, SZ_2M, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, + memfd, gpa - DATA_GPA); + TEST_ASSERT(v != MAP_FAILED, + __KVM_SYSCALL_ERROR("mmap()", + (int)(unsigned long)MAP_FAILED)); + TEST_ASSERT(v == hva, "mmap(MAP_FIXED) v %p hva %p", + v, hva); + } else { + fault_addr = hva; + r = madvise(hva, PAGE_SIZE, MADV_FREE); + TEST_ASSERT(!r, "madvise(MADV_FREE) should success"); + } + + sigbus_handler_set(false); +} + +static void punch_hole(struct kvm_vcpu *vcpu, int memfd, uint64_t gpa, uint64_t len) +{ + int r; + + /* Discard poisoned page. */ + r = fallocate(memfd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, + gpa - DATA_GPA, len); + TEST_ASSERT(!r || (r == -1 && errno == ENOENT), + "fallocate(PUNCH_HOLE, PAGE_SIZE) failed at fd = %d, offset = %lx\n", + memfd, gpa - DATA_GPA); +} + +static void punch_hole_page(struct kvm_vcpu *vcpu, int memfd, bool huge_page, + uint64_t gpa) +{ + /* Discard the poisoned page and assign new page. */ + if (huge_page) { + gpa = align_down(gpa, SZ_2M); + punch_hole(vcpu, memfd, gpa, SZ_2M); + } else + punch_hole(vcpu, memfd, gpa, PAGE_SIZE); +} + +static void munmap_hole(struct kvm_vcpu *vcpu, int memfd, uint64_t gpa, + uint64_t len) +{ + uint8_t *hva = addr_gpa2hva(vcpu->vm, gpa); + void *v; + int r; + + /* hwpoison is also recorded in the PTE entry. Clear it. */ + r = munmap(hva, len); + TEST_ASSERT(!r, "munmap() should succeed"); + + /* Map it again with new page for guest to continue. */ + v = mmap(hva, len, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, + memfd, gpa - DATA_GPA); + TEST_ASSERT(v != MAP_FAILED, + __KVM_SYSCALL_ERROR("mmap()", (int)(unsigned long)MAP_FAILED)); + TEST_ASSERT(v == hva, + "mmap(MAP_FIXED) v %p hva %p", v, hva); +} + +static void munmap_page(struct kvm_vcpu *vcpu, int memfd, bool huge_page, + uint64_t gpa) +{ + if (huge_page) { + gpa = align_down(gpa, SZ_2M); + munmap_hole(vcpu, memfd, gpa, SZ_2M); + } else + munmap_hole(vcpu, memfd, gpa, PAGE_SIZE); +} + +static void test_fadvise(struct kvm_vcpu *vcpu, int memfd, bool huge_page, + uint64_t gpa) +{ + /* See vm_mem_add() in test_mem_failure() */ + uint64_t offset = gpa - DATA_GPA; + int ret; + + discard_nop(vcpu); + ret = posix_fadvise(memfd, offset, sizeof(u64), FADV_HWPOISON); + /* posix_fadvise() doesn't set errno, but returns erorr no. */ + if (ret) + errno = ret; + __TEST_REQUIRE(ret != EPERM, + "Injecting memory fault requires CAP_SYS_ADMIN ret %d", + ret); + TEST_ASSERT(!ret || ret == EBUSY, + "posix_fadvise(FADV_HWPOISON) should success ret %d", ret); + + sigbus_handler_set(true); + if (!sigsetjmp(sigbuf, 1)) { + /* Trigger SIGBUS */ + fault_addr = addr_gpa2hva(vcpu->vm, gpa); + vcpu_run(vcpu); + TEST_FAIL("SIGBUS isn't triggered"); + } + + punch_hole_page(vcpu, memfd, huge_page, gpa); + if (!huge_page && !sigsetjmp(sigbuf, 1)) { + fault_addr = addr_gpa2hva(vcpu->vm, gpa); + vcpu_run(vcpu); + TEST_FAIL("SIGBUS isn't triggered"); + } + munmap_page(vcpu, memfd, huge_page, gpa); + sigbus_handler_set(false); +} + +static void test_mce(struct kvm_vcpu *vcpu, int memfd, bool huge_page, + uint64_t gpa) +{ + inject_mce(vcpu, memfd, gpa); + + sigbus_handler_set(true); + if (!sigsetjmp(sigbuf, 1)) { + /* Give KVM a chance to inject MCE and trigger SIGBUS. */ + fault_addr = addr_gpa2hva(vcpu->vm, gpa); + discard_nop(vcpu); + + /* As the mce framework uses work queue, give it time. */ + sleep(1); + TEST_FAIL("SIGBUS isn't triggered"); + } + + if (!sigsetjmp(sigbuf, 1)) { + /* Trigger SIGBUS */ + fault_addr = addr_gpa2hva(vcpu->vm, gpa); + vcpu_run(vcpu); + TEST_FAIL("SIGBUS isn't triggered"); + } + + punch_hole_page(vcpu, memfd, huge_page, gpa); + if (!huge_page && !sigsetjmp(sigbuf, 1)) { + fault_addr = addr_gpa2hva(vcpu->vm, gpa); + vcpu_run(vcpu); + TEST_FAIL("SIGBUS isn't triggered"); + } + munmap_page(vcpu, memfd, huge_page, gpa); + sigbus_handler_set(false); +} + +struct inject_mce_args { + struct kvm_vcpu *vcpu; + uint64_t gpa; + int memfd; +}; + +static void *inject_mce_remote(void *args_) +{ + struct inject_mce_args *args = args_; + struct kvm_vcpu *vcpu = args->vcpu; + uint8_t *hva = addr_gpa2hva(vcpu->vm, DATA_GPA); + + /* Wait for vcpu running */ + while (!READ_ONCE(*hva)) + ; + + inject_mce(vcpu, args->memfd, args->gpa); + return NULL; +} + +static void test_mce_remote(struct kvm_vcpu *vcpu, int memfd, bool huge_page, + uint64_t gpa) +{ + uint64_t *hva = addr_gpa2hva(vcpu->vm, gpa); + struct inject_mce_args args = { + .memfd = memfd, + .vcpu = vcpu, + .gpa = gpa, + }; + pthread_t thread; + + /* Make another thread inject mce while vcpu running. */ + fault_addr = hva; + pthread_create(&thread, NULL, inject_mce_remote, &args); + + sigbus_handler_set(true); + if (!sigsetjmp(sigbuf, 1)) { + vcpu_run(vcpu); + TEST_FAIL("SIGBUS isn't triggered"); + } + pthread_join(thread, NULL); + + punch_hole_page(vcpu, memfd, huge_page, gpa); + if (!huge_page && !sigsetjmp(sigbuf, 1)) { + fault_addr = hva; + vcpu_run(vcpu); + TEST_FAIL("SIGBUS isn't triggered"); + } + + munmap_page(vcpu, memfd, huge_page, gpa); + sigbus_handler_set(false); + discard_nop(vcpu); +} + +/* How to inject failure */ +enum failure_injection { + INJECT_MADVISE, + INJECT_FADVISE, + INJECT_MCE, + INJECT_MCE_REMOTE, +}; + +struct test_args { + struct kvm_vcpu *vcpu; + enum failure_injection how; + bool huge_page; + int memfd; +}; + +static void *__test_mem_failure(void *args_) +{ + struct test_args *args = args_; + enum failure_injection how = args->how; + struct kvm_vcpu *vcpu = args->vcpu; + bool huge_page = args->huge_page; + struct kvm_run *run = vcpu->run; + int memfd = args->memfd; + struct ucall uc; + + for ( ;; ) { + vcpu_run(vcpu); + + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, + "Wanted KVM_EXIT_IO, got exit reason: %u (%s)", + run->exit_reason, exit_reason_str(run->exit_reason)); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + break; + case UCALL_SYNC: { + uint64_t gpa = uc.args[1]; + + if (uc.args[0] == GUEST_NOP) + break; + + TEST_ASSERT(uc.args[0] == GUEST_HWPOISON, + "Unknown sync command '%ld'", uc.args[0]); + + switch (how) { + case INJECT_MADVISE: + test_madvise(vcpu, memfd, huge_page, gpa); + break; + case INJECT_FADVISE: + test_fadvise(vcpu, memfd, huge_page, gpa); + break; + case INJECT_MCE: + test_mce(vcpu, memfd, huge_page, gpa); + break; + case INJECT_MCE_REMOTE: + test_mce_remote(vcpu, memfd, huge_page, gpa); + break; + default: + TEST_FAIL("Unknown sync ucall %lu", uc.args[0]); + break; + } + break; + } + case UCALL_DONE: + return NULL; + default: + TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd); + } + } +} + +static void test_mem_failure(enum failure_injection how, bool huge_page) +{ + enum vm_mem_backing_src_type src_type; + struct userspace_mem_region *region; + struct test_args args; + struct kvm_vcpu *vcpu; + pthread_t thread; + struct kvm_vm *vm; + size_t size; + int r; + + if (how == INJECT_MADVISE) { + /* madvise(MADV_FREE) requires anonymous region. */ + if (huge_page) + src_type = VM_MEM_SRC_ANONYMOUS_MEMFD_HUGETLB; + else + src_type = VM_MEM_SRC_ANONYMOUS_MEMFD; + } else { + /* + * Use memfd_create() for fadvise() because * fadvise() doesn't + * work on private anonymous page. + */ + if (huge_page) + src_type = VM_MEM_SRC_SHARED_HUGETLB; + else + src_type = VM_MEM_SRC_SHMEM; + } + + vm = vm_create_with_one_vcpu(&vcpu, guest_code); + + /* + * When poisoning memory, the kernel searches mapped pages and inject + * sigbus with the virtual address. Avoid alias mapping for + * deterministic result. + */ + size = align_up(DATA_SIZE, get_backing_src_pagesz(src_type)); + __vm_userspace_mem_region_add(vm, src_type, DATA_GPA, DATA_SLOT, + size / vm->page_size, 0, false); + + region = memslot2region(vm, DATA_SLOT); + if (huge_page) { + r = madvise(region->host_mem, size, MADV_HUGEPAGE); + TEST_ASSERT(!r, "madvise(MADV_HUGEPAGE) should succeed"); + } else { + r = madvise(region->host_mem, size, MADV_NOHUGEPAGE); + TEST_ASSERT(!r, "madvise(MADV_NOHUGEPAGE) should succeed"); + } + + virt_map(vm, DATA_GPA, DATA_GPA, size / vm->page_size); + vcpu_args_set(vcpu, 3, DATA_GPA, huge_page, how == INJECT_MCE_REMOTE); + + args = (struct test_args) { + .vcpu = vcpu, + .how = how, + .huge_page = huge_page, + .memfd = region->fd, + }; + pthread_create(&thread, NULL, __test_mem_failure, &args); + + pthread_join(thread, NULL); + kvm_vm_free(vm); +} + +static void help(const char *prog_name) +{ + printf("usage: %s [-f] [-h] [-i] [-m] [-r] [-?]\n" + " -f: use fadvise for memory poison\n" + " -h: use huge page\n" + " -i: use mce injection\n" + " -m: use madvise for memory poison\n" + " -r: use mce injection remotely\n" + " -?: print this message\n", + prog_name); +} + +int main(int argc, char *argv[]) +{ + enum failure_injection how = INJECT_MADVISE; + bool huge_page = false; + int opt; + + while ((opt = getopt(argc, argv, "fhimr?")) != -1) { + switch (opt) { + case 'f': + how = INJECT_FADVISE; + break; + case 'i': + how = INJECT_MCE; + break; + case 'h': + huge_page = true; + break; + case 'm': + how = INJECT_MADVISE; + break; + case 'r': + how = INJECT_MCE_REMOTE; + break; + case '?': + default: + help(argv[0]); + exit(0); + } + } + + test_mem_failure(how, huge_page); + + return 0; +}