diff mbox series

Possible minor CPU bug on Zen2 in regard to using very high GPA in a VM

Message ID f8071f73869de34961ea1a35177fc778bb99d4b7.camel@redhat.com (mailing list archive)
State New, archived
Headers show
Series Possible minor CPU bug on Zen2 in regard to using very high GPA in a VM | expand

Commit Message

Maxim Levitsky Aug. 4, 2021, 12:59 p.m. UTC
Hi!
 
I recently triaged a series of failures that I am seeing on both of my AMD machines in the kvm selftests.

One test failed due to a trivial typo, to which I had sent a fix, but most of the other tests failed
due to what I now suspect to be a very minor but still a CPU bug.
 
All of the failing tests except two tests that timeout (and I haven't yet triaged them),
use the perf_test_util.c library.
All of these fail with SHUTDOWN exit reason.

After a relatively recent commit ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()"),
vm_get_max_gfn() was fixed to return the maximum GFN that the guest can use.
For default VM type this value is obtained from 'vm->pa_bit's which is in turn obtained
from guest's cpuid in kvm_get_cpu_address_width function.
 
It is 48 on both my AMD machines (3970X and 4650U) and also on remote EPYC 7302P machine.
(all of them are Zen2 machines)
 
My 3970X has SME enabled by BIOS, while my 4650U doesn't have it enabled.
The 7302P also has SME enabled.
SEV was obviously not enabled for the test.
NPT was enabled.
 
It appears that if the guest uses any GPA above 0xFFFCFFFFF000 in its guest paging tables, 
then it gets #PF with reserved bits error code.
 
That causes the guest to shutdown because the kvm unit tests don't setup exception handling
(I think).
 
I used my 'intercept all exceptions' debug feature to enable #PF intercept which allowed
me to clearly see the #PF with a reserved bit reason happening prior to shutdown.
 
I attached a simple reproducer for this.
 

PS:
 
I did my best to check that this isn't a code/compiler bug in the selftests.
 
I did find one bug (which one can even claim to be a compiler bug,
but I think that due to very undefined nature of bitfields, the compiler
is allowed to do this):
 
In addr_gva2gpa we have this code 
 
return (pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
 
The pfn is declared as 'uint64_t pfn:40'
 
When the PTE is set to 'fffd00000003' for example,
this code for some reason returns 0xfd00000000 instead of '0xfffd00000000'
 
If 'pte[index[0]].pfn' is copied to uint64_t variable and then 
shifted / multiplied by the page size then the correct value is printed.
 
This was tested on both gcc 10.2.1 that comes with fedora 32,  gcc 11.1.1 that 
comes with fedora 34 and gcc 8.5.0 that comes with RHEL 8.5.0.
 
However the raw PTE value does seem to be correctly set, so it looks like
this problem is not related to the possible CPU bug I found.
 
The reproducer I attached has few test 'printf's for this issue as well.
 
Best regards,
	Maxim Levitsky

Comments

Sean Christopherson Aug. 4, 2021, 3:42 p.m. UTC | #1
On Wed, Aug 04, 2021, Maxim Levitsky wrote:
> Hi!
>  
> I recently triaged a series of failures that I am seeing on both of my AMD machines in the kvm selftests.
> 
> One test failed due to a trivial typo, to which I had sent a fix, but most of the other tests failed
> due to what I now suspect to be a very minor but still a CPU bug.
>  
> All of the failing tests except two tests that timeout (and I haven't yet triaged them),
> use the perf_test_util.c library.
> All of these fail with SHUTDOWN exit reason.
> 
> After a relatively recent commit ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()"),
> vm_get_max_gfn() was fixed to return the maximum GFN that the guest can use.
> For default VM type this value is obtained from 'vm->pa_bit's which is in turn obtained
> from guest's cpuid in kvm_get_cpu_address_width function.
>  
> It is 48 on both my AMD machines (3970X and 4650U) and also on remote EPYC 7302P machine.
> (all of them are Zen2 machines)
>  
> My 3970X has SME enabled by BIOS, while my 4650U doesn't have it enabled.
> The 7302P also has SME enabled.
> SEV was obviously not enabled for the test.
> NPT was enabled.
>  
> It appears that if the guest uses any GPA above 0xFFFCFFFFF000 in its guest paging tables, 
> then it gets #PF with reserved bits error code.

LOL, I encountered this joy a few weeks back.  There's a magic Hyper-Transport
region at the top of memory that is reserved, even for GPAs.  You and I say
"CPU BUG!!!", AMD says "working as intended" ;-)

https://lkml.kernel.org/r/20210625020354.431829-2-seanjc@google.com
Dr. David Alan Gilbert Aug. 4, 2021, 3:58 p.m. UTC | #2
* Sean Christopherson (seanjc@google.com) wrote:
> On Wed, Aug 04, 2021, Maxim Levitsky wrote:
> > Hi!
> >  
> > I recently triaged a series of failures that I am seeing on both of my AMD machines in the kvm selftests.
> > 
> > One test failed due to a trivial typo, to which I had sent a fix, but most of the other tests failed
> > due to what I now suspect to be a very minor but still a CPU bug.
> >  
> > All of the failing tests except two tests that timeout (and I haven't yet triaged them),
> > use the perf_test_util.c library.
> > All of these fail with SHUTDOWN exit reason.
> > 
> > After a relatively recent commit ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()"),
> > vm_get_max_gfn() was fixed to return the maximum GFN that the guest can use.
> > For default VM type this value is obtained from 'vm->pa_bit's which is in turn obtained
> > from guest's cpuid in kvm_get_cpu_address_width function.
> >  
> > It is 48 on both my AMD machines (3970X and 4650U) and also on remote EPYC 7302P machine.
> > (all of them are Zen2 machines)
> >  
> > My 3970X has SME enabled by BIOS, while my 4650U doesn't have it enabled.
> > The 7302P also has SME enabled.
> > SEV was obviously not enabled for the test.
> > NPT was enabled.
> >  
> > It appears that if the guest uses any GPA above 0xFFFCFFFFF000 in its guest paging tables, 
> > then it gets #PF with reserved bits error code.
> 
> LOL, I encountered this joy a few weeks back.  There's a magic Hyper-Transport
> region at the top of memory that is reserved, even for GPAs.  You and I say
> "CPU BUG!!!", AMD says "working as intended" ;-)
> 
> https://lkml.kernel.org/r/20210625020354.431829-2-seanjc@google.com

Hmm, it might be nice if in one of the AMD manuals there was a list of
all address spaces and for each one, a list of things we should expect
to be surprised by.

Dave
Maxim Levitsky Aug. 4, 2021, 4:04 p.m. UTC | #3
On Wed, 2021-08-04 at 15:42 +0000, Sean Christopherson wrote:
> On Wed, Aug 04, 2021, Maxim Levitsky wrote:
> > Hi!
> >  
> > I recently triaged a series of failures that I am seeing on both of my AMD machines in the kvm selftests.
> > 
> > One test failed due to a trivial typo, to which I had sent a fix, but most of the other tests failed
> > due to what I now suspect to be a very minor but still a CPU bug.
> >  
> > All of the failing tests except two tests that timeout (and I haven't yet triaged them),
> > use the perf_test_util.c library.
> > All of these fail with SHUTDOWN exit reason.
> > 
> > After a relatively recent commit ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()"),
> > vm_get_max_gfn() was fixed to return the maximum GFN that the guest can use.
> > For default VM type this value is obtained from 'vm->pa_bit's which is in turn obtained
> > from guest's cpuid in kvm_get_cpu_address_width function.
> >  
> > It is 48 on both my AMD machines (3970X and 4650U) and also on remote EPYC 7302P machine.
> > (all of them are Zen2 machines)
> >  
> > My 3970X has SME enabled by BIOS, while my 4650U doesn't have it enabled.
> > The 7302P also has SME enabled.
> > SEV was obviously not enabled for the test.
> > NPT was enabled.
> >  
> > It appears that if the guest uses any GPA above 0xFFFCFFFFF000 in its guest paging tables, 
> > then it gets #PF with reserved bits error code.
> 
> LOL, I encountered this joy a few weeks back.  There's a magic Hyper-Transport
> region at the top of memory that is reserved, even for GPAs.  You and I say
> "CPU BUG!!!", AMD says "working as intended" ;-)
> 
> https://lkml.kernel.org/r/20210625020354.431829-2-seanjc@google.com
> 

Cool!


It this documented anywhere? I guess KVM selftests tests can't read MSRs,
so even if this is exposed to a MSR, that would be useless.

Is it related to 'External Access Protection' stuff in the AMD's manual?

I guess I can at least do something like that in the unit tests:
"If max physical bits == 48 and machine == AMD then max GPA = 0xFFFCFFFFF000".

I'll prepare a patch for this if you agree.

Or I can fix the unit test to not allocate from the very top of the GPA space.
Dropping one bit from max physical address is enough to workaround this.

What do you think?

Best regards,
	Maxim Levitsky
diff mbox series

Patch

From 8a5ac0b711ac062897c16b9289989969cb66e5e8 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk@redhat.com>
Date: Wed, 4 Aug 2021 12:40:03 +0300
Subject: [PATCH] KVM: selftests: add large GPA reproducer

---
 tools/testing/selftests/kvm/.gitignore        |  1 +
 tools/testing/selftests/kvm/Makefile          |  1 +
 .../selftests/kvm/lib/x86_64/processor.c      |  8 +++
 .../selftests/kvm/x86_64/large_gpa_test.c     | 56 +++++++++++++++++++
 4 files changed, 66 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/large_gpa_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 36896d251977..b89fc9886b54 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -39,6 +39,7 @@ 
 /x86_64/xss_msr_test
 /x86_64/vmx_pmu_msrs_test
 /x86_64/vmx_pi_mmio_test
+/x86_64/large_gpa_tests
 /access_tracking_perf_test
 /demand_paging_test
 /dirty_log_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index c103873531e0..6c1ff2594607 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -84,6 +84,7 @@  TEST_GEN_PROGS_x86_64 += memslot_perf_test
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
 TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
+TEST_GEN_PROGS_x86_64 += x86_64/large_gpa_test
 
 TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 28cb881f440d..b564298b9b4d 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -587,6 +587,14 @@  vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
 	if (!pte[index[0]].present)
 		goto unmapped_gva;
 
+	uint64_t raw_value = *(uint64_t*)&pte[index[0]];
+	uint64_t pfn = pte[index[0]].pfn;
+
+	printf("Raw PTE value is %lx\n", raw_value);
+	printf("Raw PFN value is %lx\n", pfn);
+	printf("Raw PFN shifted by 12 bit value is %lx\n", pfn << 12ULL);
+	printf("Raw PFN shifted by 12 bit value is %lx\n", (pte[index[0]].pfn) << 12ULL);
+
 	return (pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
 
 unmapped_gva:
diff --git a/tools/testing/selftests/kvm/x86_64/large_gpa_test.c b/tools/testing/selftests/kvm/x86_64/large_gpa_test.c
new file mode 100644
index 000000000000..c0ef9450e885
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/large_gpa_test.c
@@ -0,0 +1,56 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021, Red Hat, Inc.
+ */
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#define VCPU_ID 0
+#define EXTRA_SLOT_INDEX 1
+#define EXTRA_SLOT_NUMPAGES 1
+
+//#define EXTRA_SLOT_GPA 0x0000FFFFFFFFF000 // fails
+//#define EXTRA_SLOT_GPA 0x0000FFFDFFFFF000 // fails
+#define EXTRA_SLOT_GPA   0x0000FFFD00000000 // fails
+//#define EXTRA_SLOT_GPA 0x0000FFFCFFFFF000 // works
+//#define EXTRA_SLOT_GPA 0x0000FFFBFFFFF000 // works
+
+#define MAP_GVA 0x80000000 // doesn't seem to matter
+
+static void guest_main(void)
+{
+	volatile uint32_t* ptr = (uint32_t* )MAP_GVA;
+	*ptr = 0xC0FFEE;
+	GUEST_DONE();
+}
+
+int main(void)
+{
+	struct kvm_vm *vm;
+	struct kvm_run *run;
+	uint64_t result_gpa;
+
+	vm = vm_create_default(VCPU_ID, 0, guest_main);
+	run = vcpu_state(vm, VCPU_ID);
+
+	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
+					EXTRA_SLOT_GPA,
+					EXTRA_SLOT_INDEX,
+					EXTRA_SLOT_NUMPAGES, 0);
+
+	virt_map(vm, MAP_GVA, EXTRA_SLOT_GPA, EXTRA_SLOT_NUMPAGES);
+
+	result_gpa = addr_gva2gpa(vm, MAP_GVA);
+
+	printf("mapped GPA is 0x%lx\n", result_gpa);
+
+	_vcpu_run(vm, VCPU_ID);
+
+	TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+		    "unexpected exit reason: %u (%s),\n",
+		    run->exit_reason,
+		    exit_reason_str(run->exit_reason));
+
+	kvm_vm_free(vm);
+}
-- 
2.26.3