diff mbox series

[v3,05/12] x86/xen: set MTRR state when running as Xen PV initial domain

Message ID 20230223093243.1180-6-jgross@suse.com (mailing list archive)
State New, archived
Headers show
Series x86/mtrr: fix handling with PAT but without MTRR | expand

Commit Message

Jürgen Groß Feb. 23, 2023, 9:32 a.m. UTC
When running as Xen PV initial domain (aka dom0), MTRRs are disabled
by the hypervisor, but the system should nevertheless use correct
cache memory types. This has always kind of worked, as disabled MTRRs
resulted in disabled PAT, too, so that the kernel avoided code paths
resulting in inconsistencies. This bypassed all of the sanity checks
the kernel is doing with enabled MTRRs in order to avoid memory
mappings with conflicting memory types.

This has been changed recently, leading to PAT being accepted to be
enabled, while MTRRs stayed disabled. The result is that
mtrr_type_lookup() no longer is accepting all memory type requests,
but started to return WB even if UC- was requested. This led to
driver failures during initialization of some devices.

In reality MTRRs are still in effect, but they are under complete
control of the Xen hypervisor. It is possible, however, to retrieve
the MTRR settings from the hypervisor.

In order to fix those problems, overwrite the MTRR state via
mtrr_overwrite_state() with the MTRR data from the hypervisor, if the
system is running as a Xen dom0.

Fixes: 72cbc8f04fe2 ("x86/PAT: Have pat_enabled() properly reflect state when running on Xen")
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch
V3:
- move the call of mtrr_overwrite_state() to xen_pv_init_platform()
---
 arch/x86/xen/enlighten_pv.c | 50 +++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

Comments

Boris Ostrovsky Feb. 24, 2023, 9 p.m. UTC | #1
On 2/23/23 4:32 AM, Juergen Gross wrote:
> +
> +	for (reg = 0; reg < MTRR_MAX_VAR_RANGES; reg++) {
> +		op.u.read_memtype.reg = reg;
> +		if (HYPERVISOR_platform_op(&op))
> +			break;


If we fail on the first iteration, do we still want to mark MTRRs are enabled/set in mtrr_overwrite_state()?


-boris


> +
> +		/*
> +		 * Only called in dom0, which has all RAM PFNs mapped at
> +		 * RAM MFNs, and all PCI space etc. is identity mapped.
> +		 * This means we can treat MFN == PFN regarding MTTR settings.
> +		 */
> +		var[reg].base_lo = op.u.read_memtype.type;
> +		var[reg].base_lo |= op.u.read_memtype.mfn << PAGE_SHIFT;
> +		var[reg].base_hi = op.u.read_memtype.mfn >> (32 - PAGE_SHIFT);
> +		mask = ~((op.u.read_memtype.nr_mfns << PAGE_SHIFT) - 1);
> +		mask &= (1UL << width) - 1;
> +		if (mask)
> +			mask |= 1 << 11;
> +		var[reg].mask_lo = mask;
> +		var[reg].mask_hi = mask >> 32;
> +	}
> +
> +	mtrr_overwrite_state(var, reg, MTRR_TYPE_UNCACHABLE);
> +#endif
> +}
Jürgen Groß Feb. 27, 2023, 7:12 a.m. UTC | #2
On 24.02.23 22:00, Boris Ostrovsky wrote:
> 
> On 2/23/23 4:32 AM, Juergen Gross wrote:
>> +
>> +	for (reg = 0; reg < MTRR_MAX_VAR_RANGES; reg++) {
>> +		op.u.read_memtype.reg = reg;
>> +		if (HYPERVISOR_platform_op(&op))
>> +			break;
> 
> 
> If we fail on the first iteration, do we still want to mark MTRRs are 
> enabled/set in mtrr_overwrite_state()?

Hmm, good idea.

I think we should just drop the call of mtrr_overwrite_state() in this
case.


Juergen
Boris Ostrovsky Feb. 27, 2023, 1:52 p.m. UTC | #3
On 2/27/23 2:12 AM, Juergen Gross wrote:
> On 24.02.23 22:00, Boris Ostrovsky wrote:
>>
>> On 2/23/23 4:32 AM, Juergen Gross wrote:
>>> +
>>> +    for (reg = 0; reg < MTRR_MAX_VAR_RANGES; reg++) {
>>> +        op.u.read_memtype.reg = reg;
>>> +        if (HYPERVISOR_platform_op(&op))
>>> +            break;
>>
>>
>> If we fail on the first iteration, do we still want to mark MTRRs are enabled/set in mtrr_overwrite_state()?
> 
> Hmm, good idea.
> 
> I think we should just drop the call of mtrr_overwrite_state() in this
> case.


TBH I am not sure what the right way is to handle errors here. What if the hypercall fails on second iteration?


-boris
Jürgen Groß Feb. 27, 2023, 1:56 p.m. UTC | #4
On 27.02.23 14:52, Boris Ostrovsky wrote:
> 
> 
> On 2/27/23 2:12 AM, Juergen Gross wrote:
>> On 24.02.23 22:00, Boris Ostrovsky wrote:
>>>
>>> On 2/23/23 4:32 AM, Juergen Gross wrote:
>>>> +
>>>> +    for (reg = 0; reg < MTRR_MAX_VAR_RANGES; reg++) {
>>>> +        op.u.read_memtype.reg = reg;
>>>> +        if (HYPERVISOR_platform_op(&op))
>>>> +            break;
>>>
>>>
>>> If we fail on the first iteration, do we still want to mark MTRRs are 
>>> enabled/set in mtrr_overwrite_state()?
>>
>> Hmm, good idea.
>>
>> I think we should just drop the call of mtrr_overwrite_state() in this
>> case.
> 
> 
> TBH I am not sure what the right way is to handle errors here. What if the 
> hypercall fails on second iteration?

The main reason would be that only one variable MTRR is available.

Its not as if there are very complicated scenarios leading to failures here.

Either the interface is usable and then it will work, or it isn't usable
and we can fall back to today's handling by ignoring MTRRs.


Juergen
diff mbox series

Patch

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index bb59cc6ddb2d..729fb447a5b6 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -68,6 +68,7 @@ 
 #include <asm/reboot.h>
 #include <asm/hypervisor.h>
 #include <asm/mach_traps.h>
+#include <asm/mtrr.h>
 #include <asm/mwait.h>
 #include <asm/pci_x86.h>
 #include <asm/cpu.h>
@@ -119,6 +120,52 @@  static int __init parse_xen_msr_safe(char *str)
 }
 early_param("xen_msr_safe", parse_xen_msr_safe);
 
+/* Get MTRR settings from Xen and put them into mtrr_state. */
+static void __init xen_set_mtrr_data(void)
+{
+#ifdef CONFIG_MTRR
+	struct xen_platform_op op = {
+		.cmd = XENPF_read_memtype,
+		.interface_version = XENPF_INTERFACE_VERSION,
+	};
+	unsigned int reg;
+	unsigned long mask;
+	uint32_t eax, width;
+	static struct mtrr_var_range var[MTRR_MAX_VAR_RANGES] __initdata;
+
+	/* Get physical address width (only 64-bit cpus supported). */
+	width = 36;
+	eax = cpuid_eax(0x80000000);
+	if ((eax >> 16) == 0x8000 && eax >= 0x80000008) {
+		eax = cpuid_eax(0x80000008);
+		width = eax & 0xff;
+	}
+
+	for (reg = 0; reg < MTRR_MAX_VAR_RANGES; reg++) {
+		op.u.read_memtype.reg = reg;
+		if (HYPERVISOR_platform_op(&op))
+			break;
+
+		/*
+		 * Only called in dom0, which has all RAM PFNs mapped at
+		 * RAM MFNs, and all PCI space etc. is identity mapped.
+		 * This means we can treat MFN == PFN regarding MTTR settings.
+		 */
+		var[reg].base_lo = op.u.read_memtype.type;
+		var[reg].base_lo |= op.u.read_memtype.mfn << PAGE_SHIFT;
+		var[reg].base_hi = op.u.read_memtype.mfn >> (32 - PAGE_SHIFT);
+		mask = ~((op.u.read_memtype.nr_mfns << PAGE_SHIFT) - 1);
+		mask &= (1UL << width) - 1;
+		if (mask)
+			mask |= 1 << 11;
+		var[reg].mask_lo = mask;
+		var[reg].mask_hi = mask >> 32;
+	}
+
+	mtrr_overwrite_state(var, reg, MTRR_TYPE_UNCACHABLE);
+#endif
+}
+
 static void __init xen_pv_init_platform(void)
 {
 	/* PV guests can't operate virtio devices without grants. */
@@ -135,6 +182,9 @@  static void __init xen_pv_init_platform(void)
 
 	/* pvclock is in shared info area */
 	xen_init_time_ops();
+
+	if (xen_initial_domain())
+		xen_set_mtrr_data();
 }
 
 static void __init xen_pv_guest_late_init(void)