From patchwork Wed Mar 30 08:34:05 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 8693671 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 9F648C0553 for ; Wed, 30 Mar 2016 08:37:10 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 592AA2026F for ; Wed, 30 Mar 2016 08:37:09 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EC3232034F for ; Wed, 30 Mar 2016 08:37:07 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1alBZr-0005Hp-Do; Wed, 30 Mar 2016 08:34:11 +0000 Received: from mail6.bemta6.messagelabs.com ([85.158.143.247]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1alBZq-0005Hj-GA for xen-devel@lists.xenproject.org; Wed, 30 Mar 2016 08:34:10 +0000 Received: from [85.158.143.35] by server-2.bemta-6.messagelabs.com id 2B/A9-09532-18F8BF65; Wed, 30 Mar 2016 08:34:09 +0000 X-Env-Sender: JBeulich@suse.com X-Msg-Ref: server-11.tower-21.messagelabs.com!1459326847!6519009!1 X-Originating-IP: [137.65.248.74] X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG X-StarScan-Received: X-StarScan-Version: 8.11; banners=-,-,- X-VirusChecked: Checked Received: (qmail 60669 invoked from network); 30 Mar 2016 08:34:08 -0000 Received: from prv-mh.provo.novell.com (HELO prv-mh.provo.novell.com) (137.65.248.74) by server-11.tower-21.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 30 Mar 2016 08:34:08 -0000 Received: from INET-PRV-MTA by prv-mh.provo.novell.com with Novell_GroupWise; Wed, 30 Mar 2016 02:34:06 -0600 Message-Id: <56FBAB9D02000078000E12C2@prv-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 14.2.0 Date: Wed, 30 Mar 2016 02:34:05 -0600 From: "Jan Beulich" To: "xen-devel" Mime-Version: 1.0 Cc: Andrew Cooper , Keir Fraser Subject: [Xen-devel] [PATCH] mwait-idle: prevent SKL-H boot failure when C8+C9+C10 enabled X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Some SKL-H configurations require "max_cstate=7" to boot. While that is an effective workaround, it disables C10. This patch detects the problematic configuration, and disables C8 and C9, keeping C10 enabled. Note that enabling SGX in BIOS SETUP can also prevent this issue, if the system BIOS provides that option. https://bugzilla.kernel.org/show_bug.cgi?id=109081 "Freezes with Intel i7 6700HQ (Skylake), unless intel_idle.max_cstate=7" Signed-off-by: Len Brown Adjust to Xen infrastructure. Signed-off-by: Jan Beulich mwait-idle: prevent SKL-H boot failure when C8+C9+C10 enabled Some SKL-H configurations require "max_cstate=7" to boot. While that is an effective workaround, it disables C10. This patch detects the problematic configuration, and disables C8 and C9, keeping C10 enabled. Note that enabling SGX in BIOS SETUP can also prevent this issue, if the system BIOS provides that option. https://bugzilla.kernel.org/show_bug.cgi?id=109081 "Freezes with Intel i7 6700HQ (Skylake), unless intel_idle.max_cstate=7" Signed-off-by: Len Brown Adjust to Xen infrastructure. Signed-off-by: Jan Beulich --- a/xen/arch/x86/cpu/mwait-idle.c +++ b/xen/arch/x86/cpu/mwait-idle.c @@ -60,7 +60,7 @@ #include #include -#define MWAIT_IDLE_VERSION "0.4" +#define MWAIT_IDLE_VERSION "0.4.1" #undef PREFIX #define PREFIX "mwait-idle: " @@ -100,6 +100,7 @@ static const struct cpuidle_state { unsigned int target_residency; /* in US */ } *cpuidle_state_table; +#define CPUIDLE_FLAG_DISABLED 0x1 /* * Set this flag for states where the HW flushes the TLB for us * and so we don't need cross-calls to keep it consistent. @@ -477,7 +478,7 @@ static const struct cpuidle_state bdw_cs {} }; -static const struct cpuidle_state skl_cstates[] = { +static struct cpuidle_state skl_cstates[] = { { .name = "C1-SKL", .flags = MWAIT2flg(0x00), @@ -781,34 +782,84 @@ static const struct x86_cpu_id intel_idl }; /* - * mwait_idle_state_table_update() - * - * Update the default state_table for this CPU-id + * ivt_idle_state_table_update(void) * - * Currently used to access tuned IVT multi-socket targets + * Tune IVT multi-socket targets * Assumption: num_sockets == (max_package_num + 1) */ -static void __init mwait_idle_state_table_update(void) +static void __init ivt_idle_state_table_update(void) { /* IVT uses a different table for 1-2, 3-4, and > 4 sockets */ - if (boot_cpu_data.x86_model == 0x3e) { /* IVT */ - unsigned int cpu, max_apicid = boot_cpu_physical_apicid; + unsigned int cpu, max_apicid = boot_cpu_physical_apicid; - for_each_present_cpu(cpu) - if (max_apicid < x86_cpu_to_apicid[cpu]) - max_apicid = x86_cpu_to_apicid[cpu]; - switch (apicid_to_socket(max_apicid)) { - case 0: case 1: - /* 1 and 2 socket systems use default ivt_cstates */ - break; - case 2: case 3: - cpuidle_state_table = ivt_cstates_4s; - break; - default: - cpuidle_state_table = ivt_cstates_8s; - break; - } + for_each_present_cpu(cpu) + if (max_apicid < x86_cpu_to_apicid[cpu]) + max_apicid = x86_cpu_to_apicid[cpu]; + switch (apicid_to_socket(max_apicid)) { + case 0: case 1: + /* 1 and 2 socket systems use default ivt_cstates */ + break; + case 2: case 3: + cpuidle_state_table = ivt_cstates_4s; + break; + default: + cpuidle_state_table = ivt_cstates_8s; + break; + } +} + +/* + * sklh_idle_state_table_update(void) + * + * On SKL-H (model 0x5e) disable C8 and C9 if: + * C10 is enabled and SGX disabled + */ +static void sklh_idle_state_table_update(void) +{ + u64 msr; + + /* if PC10 disabled via cmdline max_cstate=7 or shallower */ + if (max_cstate <= 7) + return; + + /* if PC10 not present in CPUID.MWAIT.EDX */ + if ((mwait_substates & (MWAIT_CSTATE_MASK << 28)) == 0) + return; + + rdmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr); + + /* PC10 is not enabled in PKG C-state limit */ + if ((msr & 0xF) != 8) + return; + + /* if SGX is present */ + if (boot_cpu_has(X86_FEATURE_SGX)) { + rdmsrl(MSR_IA32_FEATURE_CONTROL, msr); + + /* if SGX is enabled */ + if (msr & (1 << 18)) + return; } + + skl_cstates[5].flags |= CPUIDLE_FLAG_DISABLED; /* C8-SKL */ + skl_cstates[6].flags |= CPUIDLE_FLAG_DISABLED; /* C9-SKL */ +} + +/* + * mwait_idle_state_table_update() + * + * Update the default state_table for this CPU-id + */ +static void __init mwait_idle_state_table_update(void) +{ + switch (boot_cpu_data.x86_model) { + case 0x3e: /* IVT */ + ivt_idle_state_table_update(); + break; + case 0x5e: /* SKL-H */ + sklh_idle_state_table_update(); + break; + } } static int __init mwait_idle_probe(void) @@ -897,6 +948,14 @@ static int mwait_idle_cpu_init(struct no if (num_substates == 0) continue; + /* if state marked as disabled, skip it */ + if (cpuidle_state_table[cstate].flags & + CPUIDLE_FLAG_DISABLED) { + printk(XENLOG_DEBUG PREFIX "state %s is disabled", + cpuidle_state_table[cstate].name); + continue; + } + if (dev->count >= ACPI_PROCESSOR_MAX_POWER) { printk(PREFIX "max C-state count of %u reached\n", ACPI_PROCESSOR_MAX_POWER); --- a/xen/include/asm-x86/msr-index.h +++ b/xen/include/asm-x86/msr-index.h @@ -288,6 +288,7 @@ #define MSR_IA32_PLATFORM_ID 0x00000017 #define MSR_IA32_EBL_CR_POWERON 0x0000002a #define MSR_IA32_EBC_FREQUENCY_ID 0x0000002c +#define MSR_IA32_FEATURE_CONTROL 0x0000003a #define MSR_IA32_TSC_ADJUST 0x0000003b #define MSR_IA32_APICBASE 0x0000001b --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -186,6 +186,7 @@ /* Intel-defined CPU features, CPUID level 0x00000007:0.ebx, word 5 */ XEN_CPUFEATURE(FSGSBASE, 5*32+ 0) /* {RD,WR}{FS,GS}BASE instructions */ XEN_CPUFEATURE(TSC_ADJUST, 5*32+ 1) /* TSC_ADJUST MSR available */ +XEN_CPUFEATURE(SGX, 5*32+ 2) /* Software Guard extensions */ XEN_CPUFEATURE(BMI1, 5*32+ 3) /* 1st bit manipulation extensions */ XEN_CPUFEATURE(HLE, 5*32+ 4) /* Hardware Lock Elision */ XEN_CPUFEATURE(AVX2, 5*32+ 5) /* AVX2 instructions */ --- a/xen/arch/x86/cpu/mwait-idle.c +++ b/xen/arch/x86/cpu/mwait-idle.c @@ -60,7 +60,7 @@ #include #include -#define MWAIT_IDLE_VERSION "0.4" +#define MWAIT_IDLE_VERSION "0.4.1" #undef PREFIX #define PREFIX "mwait-idle: " @@ -100,6 +100,7 @@ static const struct cpuidle_state { unsigned int target_residency; /* in US */ } *cpuidle_state_table; +#define CPUIDLE_FLAG_DISABLED 0x1 /* * Set this flag for states where the HW flushes the TLB for us * and so we don't need cross-calls to keep it consistent. @@ -477,7 +478,7 @@ static const struct cpuidle_state bdw_cs {} }; -static const struct cpuidle_state skl_cstates[] = { +static struct cpuidle_state skl_cstates[] = { { .name = "C1-SKL", .flags = MWAIT2flg(0x00), @@ -781,34 +782,84 @@ static const struct x86_cpu_id intel_idl }; /* - * mwait_idle_state_table_update() - * - * Update the default state_table for this CPU-id + * ivt_idle_state_table_update(void) * - * Currently used to access tuned IVT multi-socket targets + * Tune IVT multi-socket targets * Assumption: num_sockets == (max_package_num + 1) */ -static void __init mwait_idle_state_table_update(void) +static void __init ivt_idle_state_table_update(void) { /* IVT uses a different table for 1-2, 3-4, and > 4 sockets */ - if (boot_cpu_data.x86_model == 0x3e) { /* IVT */ - unsigned int cpu, max_apicid = boot_cpu_physical_apicid; + unsigned int cpu, max_apicid = boot_cpu_physical_apicid; - for_each_present_cpu(cpu) - if (max_apicid < x86_cpu_to_apicid[cpu]) - max_apicid = x86_cpu_to_apicid[cpu]; - switch (apicid_to_socket(max_apicid)) { - case 0: case 1: - /* 1 and 2 socket systems use default ivt_cstates */ - break; - case 2: case 3: - cpuidle_state_table = ivt_cstates_4s; - break; - default: - cpuidle_state_table = ivt_cstates_8s; - break; - } + for_each_present_cpu(cpu) + if (max_apicid < x86_cpu_to_apicid[cpu]) + max_apicid = x86_cpu_to_apicid[cpu]; + switch (apicid_to_socket(max_apicid)) { + case 0: case 1: + /* 1 and 2 socket systems use default ivt_cstates */ + break; + case 2: case 3: + cpuidle_state_table = ivt_cstates_4s; + break; + default: + cpuidle_state_table = ivt_cstates_8s; + break; + } +} + +/* + * sklh_idle_state_table_update(void) + * + * On SKL-H (model 0x5e) disable C8 and C9 if: + * C10 is enabled and SGX disabled + */ +static void sklh_idle_state_table_update(void) +{ + u64 msr; + + /* if PC10 disabled via cmdline max_cstate=7 or shallower */ + if (max_cstate <= 7) + return; + + /* if PC10 not present in CPUID.MWAIT.EDX */ + if ((mwait_substates & (MWAIT_CSTATE_MASK << 28)) == 0) + return; + + rdmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr); + + /* PC10 is not enabled in PKG C-state limit */ + if ((msr & 0xF) != 8) + return; + + /* if SGX is present */ + if (boot_cpu_has(X86_FEATURE_SGX)) { + rdmsrl(MSR_IA32_FEATURE_CONTROL, msr); + + /* if SGX is enabled */ + if (msr & (1 << 18)) + return; } + + skl_cstates[5].flags |= CPUIDLE_FLAG_DISABLED; /* C8-SKL */ + skl_cstates[6].flags |= CPUIDLE_FLAG_DISABLED; /* C9-SKL */ +} + +/* + * mwait_idle_state_table_update() + * + * Update the default state_table for this CPU-id + */ +static void __init mwait_idle_state_table_update(void) +{ + switch (boot_cpu_data.x86_model) { + case 0x3e: /* IVT */ + ivt_idle_state_table_update(); + break; + case 0x5e: /* SKL-H */ + sklh_idle_state_table_update(); + break; + } } static int __init mwait_idle_probe(void) @@ -897,6 +948,14 @@ static int mwait_idle_cpu_init(struct no if (num_substates == 0) continue; + /* if state marked as disabled, skip it */ + if (cpuidle_state_table[cstate].flags & + CPUIDLE_FLAG_DISABLED) { + printk(XENLOG_DEBUG PREFIX "state %s is disabled", + cpuidle_state_table[cstate].name); + continue; + } + if (dev->count >= ACPI_PROCESSOR_MAX_POWER) { printk(PREFIX "max C-state count of %u reached\n", ACPI_PROCESSOR_MAX_POWER); --- a/xen/include/asm-x86/msr-index.h +++ b/xen/include/asm-x86/msr-index.h @@ -288,6 +288,7 @@ #define MSR_IA32_PLATFORM_ID 0x00000017 #define MSR_IA32_EBL_CR_POWERON 0x0000002a #define MSR_IA32_EBC_FREQUENCY_ID 0x0000002c +#define MSR_IA32_FEATURE_CONTROL 0x0000003a #define MSR_IA32_TSC_ADJUST 0x0000003b #define MSR_IA32_APICBASE 0x0000001b --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -186,6 +186,7 @@ /* Intel-defined CPU features, CPUID level 0x00000007:0.ebx, word 5 */ XEN_CPUFEATURE(FSGSBASE, 5*32+ 0) /* {RD,WR}{FS,GS}BASE instructions */ XEN_CPUFEATURE(TSC_ADJUST, 5*32+ 1) /* TSC_ADJUST MSR available */ +XEN_CPUFEATURE(SGX, 5*32+ 2) /* Software Guard extensions */ XEN_CPUFEATURE(BMI1, 5*32+ 3) /* 1st bit manipulation extensions */ XEN_CPUFEATURE(HLE, 5*32+ 4) /* Hardware Lock Elision */ XEN_CPUFEATURE(AVX2, 5*32+ 5) /* AVX2 instructions */