Message ID | 20220425145530.723858-1-alexandru.elisei@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM/arm64: Don't emulate a PMU for 32-bit guests if feature not set | expand |
[+ Oliver] Hi Alex, On Mon, 25 Apr 2022 15:55:30 +0100, Alexandru Elisei <alexandru.elisei@arm.com> wrote: > > kvm->arch.arm_pmu is set when userspace attempts to set the first PMU > attribute. As certain attributes are mandatory, arm_pmu ends up always > being set to a valid arm_pmu, otherwise KVM will refuse to run the VCPU. > However, this only happens if the VCPU has the PMU feature. If the VCPU > doesn't have the feature bit set, kvm->arch.arm_pmu will be left > uninitialized and equal to NULL. Although I'm not opposed to this as an immediate workaround to avoid the ugly crash, I think sanitising the AArch32 regs is the way to go. Oliver had a stab at this a few weeks back[1], but this seem to have stalled. Could you have a look and see if anything was missing (the patches needed some rework, but I haven't checked whether DFR0 was correctly handled or not). Thanks, M. [1] https://lore.kernel.org/r/20220401010832.3425787-1-oupton@google.com
Howdy, On Mon, Apr 25, 2022 at 10:14 AM Marc Zyngier <maz@kernel.org> wrote: > Although I'm not opposed to this as an immediate workaround to avoid > the ugly crash, I think sanitising the AArch32 regs is the way to go. > Oliver had a stab at this a few weeks back[1], but this seem to have > stalled. Planning on sending out a new spin this week, I had been juggling a few too many things since mailing that one out :) I was honestly surprised nothing had blown up yet! -- Thanks, Oliver
Hi Alex, On Mon, Apr 25, 2022 at 03:55:30PM +0100, Alexandru Elisei wrote: [...] > The root cause remains the same: kvm->arch.pmuver was never set to > something sensible because the VCPU feature itself was never set. > > The odroid-c4 is somewhat of a special case, because Linux doesn't probe > the PMU. But the above errors can easily be reproduced on any hardware, > with or without a PMU driver, as long as userspace doesn't set the PMU > feature. This note has me wondering if we could do more negative testing with kvm-unit-tests just by selectively turning on/off features, with the expectation that tests either skip or pass. > Work around the fact that KVM advertises a PMU even when the VCPU feature > is not set by gating all PMU emulation on the feature. The guest can still > access the registers without KVM injecting an undefined exception. We're going to need something similar even after KVM conditionally advertises the PMU. WDYT about wiring up sys_reg_desc::visibility for the AArch32 PMU registers? For now just treat them as REG_RAZ (probably extend this to imply WI too) then promote to REG_HIDDEN in a later patch. -- Thanks, Oliver
On Tue, Apr 26, 2022 at 08:05:11AM +0000, Oliver Upton wrote: > Hi Alex, > > On Mon, Apr 25, 2022 at 03:55:30PM +0100, Alexandru Elisei wrote: > > [...] > > > The root cause remains the same: kvm->arch.pmuver was never set to > > something sensible because the VCPU feature itself was never set. > > > > The odroid-c4 is somewhat of a special case, because Linux doesn't probe > > the PMU. But the above errors can easily be reproduced on any hardware, > > with or without a PMU driver, as long as userspace doesn't set the PMU > > feature. > > This note has me wondering if we could do more negative testing with > kvm-unit-tests just by selectively turning on/off features, with the > expectation that tests either skip or pass. I'm not sure that that can be accomplished right now. kvm-unit-tests supports only qemu as an automated test runner, and qemu enables the PMU by default. I don't know if it can be disabled, it would be nice if it could. I stumbled upon this by mistake, when I ran kvmtool without enabling the PMU (the default in kvmtool is to not have it enabled). If it is possible to disable PMU emulation from the qemu's command line, then it should be as simple as writing a test that expects all PMU register accesses to trigger an undefined exception (and adding a new test definition). If it is not possible to do this with qemu, then we would have to wait until kvm-unit-tests can use kvmtool as the test runner. I have an RFC sent for that [1], I need to get back to it. Another option would be to have this as a kselftest, although I don't know how easy it is to register an exception handler in a kselftest. The test could be further expanded to other registers gated by a VCPU feature being set. [1] https://lore.kernel.org/kvmarm/20210702163122.96110-1-alexandru.elisei@arm.com/ > > > Work around the fact that KVM advertises a PMU even when the VCPU feature > > is not set by gating all PMU emulation on the feature. The guest can still > > access the registers without KVM injecting an undefined exception. > > We're going to need something similar even after KVM conditionally > advertises the PMU. > > WDYT about wiring up sys_reg_desc::visibility for the AArch32 PMU > registers? For now just treat them as REG_RAZ (probably extend this to > imply WI too) then promote to REG_HIDDEN in a later patch. I was thinking you can simply use .visibility = pmu_visibility, like it's done with the PMU_SYS_REG macro: --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -2014,19 +2014,22 @@ static const struct sys_reg_desc cp14_64_regs[] = { { Op1( 0), CRm( 2), .access = trap_raz_wi }, }; +#define CP15_PMU_SYS_REG(_map, _Op1, _CRn, _CRm, _Op2) \ + AA32(_map), \ + Op1(_Op1), CRn(_CRn), CRm(_CRm), Op2(_Op2), \ + .visibility = pmu_visibility + /* Macro to expand the PMEVCNTRn register */ #define PMU_PMEVCNTR(n) \ - /* PMEVCNTRn */ \ - { Op1(0), CRn(0b1110), \ - CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \ - access_pmu_evcntr } + { CP15_PMU_SYS_REG(DIRECT, 0, 0b1110, \ + (0b1000 | (((n) >> 3) & 0x3)), ((n) & 0x7)), \ + .access = access_pmu_evcntr } /* Macro to expand the PMEVTYPERn register */ #define PMU_PMEVTYPER(n) \ - /* PMEVTYPERn */ \ - { Op1(0), CRn(0b1110), \ - CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \ - access_pmu_evtyper } + { CP15_PMU_SYS_REG(DIRECT, 0, 0b1110, \ + (0b1100 | (((n) >> 3) & 0x3)), ((n) & 0x7)), \ + .access = access_pmu_evtyper } /* * Trapped cp15 registers. TTBR0/TTBR1 get a double encoding, @@ -2067,25 +2070,25 @@ static const struct sys_reg_desc cp15_regs[] = { { Op1( 0), CRn( 7), CRm(14), Op2( 2), access_dcsw }, /* PMU */ - { Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr }, - { Op1( 0), CRn( 9), CRm(12), Op2( 1), access_pmcnten }, - { Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmcnten }, - { Op1( 0), CRn( 9), CRm(12), Op2( 3), access_pmovs }, - { Op1( 0), CRn( 9), CRm(12), Op2( 4), access_pmswinc }, - { Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmselr }, - { AA32(LO), Op1( 0), CRn( 9), CRm(12), Op2( 6), access_pmceid }, - { AA32(LO), Op1( 0), CRn( 9), CRm(12), Op2( 7), access_pmceid }, - { Op1( 0), CRn( 9), CRm(13), Op2( 0), access_pmu_evcntr }, - { Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_evtyper }, - { Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_evcntr }, - { Op1( 0), CRn( 9), CRm(14), Op2( 0), access_pmuserenr }, - { Op1( 0), CRn( 9), CRm(14), Op2( 1), access_pminten }, - { Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pminten }, - { Op1( 0), CRn( 9), CRm(14), Op2( 3), access_pmovs }, - { AA32(HI), Op1( 0), CRn( 9), CRm(14), Op2( 4), access_pmceid }, - { AA32(HI), Op1( 0), CRn( 9), CRm(14), Op2( 5), access_pmceid }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 0), .access = access_pmcr }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 1), .access = access_pmcnten }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 2), .access = access_pmcnten }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 3), .access = access_pmovs }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 4), .access = access_pmswinc }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 5), .access = access_pmselr }, + { CP15_PMU_SYS_REG(LO, 0, 9, 12, 6), .access = access_pmceid }, + { CP15_PMU_SYS_REG(LO, 0, 9, 12, 7), .access = access_pmceid }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 13, 0), .access = access_pmu_evcntr }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 13, 1), .access = access_pmu_evtyper }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 13, 2), .access = access_pmu_evcntr }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 0), .access = access_pmuserenr }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 1), .access = access_pminten }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 2), .access = access_pminten }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 3), .access = access_pmovs }, + { CP15_PMU_SYS_REG(HI, 0, 9, 14, 4), .access = access_pmceid }, + { CP15_PMU_SYS_REG(HI, 0, 9, 14, 5), .access = access_pmceid }, I've renamed AA32_ZEROHIGH -> AA32_DIRECT. Feel free to use the snippet as you see fit (or not at all). Thanks, Alex
Hi, On Mon, Apr 25, 2022 at 06:14:13PM +0100, Marc Zyngier wrote: > [+ Oliver] > > Hi Alex, > > On Mon, 25 Apr 2022 15:55:30 +0100, > Alexandru Elisei <alexandru.elisei@arm.com> wrote: > > > > kvm->arch.arm_pmu is set when userspace attempts to set the first PMU > > attribute. As certain attributes are mandatory, arm_pmu ends up always > > being set to a valid arm_pmu, otherwise KVM will refuse to run the VCPU. > > However, this only happens if the VCPU has the PMU feature. If the VCPU > > doesn't have the feature bit set, kvm->arch.arm_pmu will be left > > uninitialized and equal to NULL. > > Although I'm not opposed to this as an immediate workaround to avoid > the ugly crash, I think sanitising the AArch32 regs is the way to go. I agree. This patch is just a band-aid. > Oliver had a stab at this a few weeks back[1], but this seem to have > stalled. > > Could you have a look and see if anything was missing (the patches > needed some rework, but I haven't checked whether DFR0 was correctly > handled or not). I'll have a look. Thanks, Alex > > Thanks, > > M. > > [1] https://lore.kernel.org/r/20220401010832.3425787-1-oupton@google.com > > -- > Without deviation from the norm, progress is not possible.
Hi Alex, On Tue, Apr 26, 2022 at 2:01 AM Alexandru Elisei <alexandru.elisei@arm.com> wrote: > > > The odroid-c4 is somewhat of a special case, because Linux doesn't probe > > > the PMU. But the above errors can easily be reproduced on any hardware, > > > with or without a PMU driver, as long as userspace doesn't set the PMU > > > feature. > > > > This note has me wondering if we could do more negative testing with > > kvm-unit-tests just by selectively turning on/off features, with the > > expectation that tests either skip or pass. > > I'm not sure that that can be accomplished right now. kvm-unit-tests > supports only qemu as an automated test runner, and qemu enables the PMU by > default. I don't know if it can be disabled, it would be nice if it could. > I stumbled upon this by mistake, when I ran kvmtool without enabling the > PMU (the default in kvmtool is to not have it enabled). > > If it is possible to disable PMU emulation from the qemu's command line, > then it should be as simple as writing a test that expects all PMU register > accesses to trigger an undefined exception (and adding a new test > definition). You can disable the PMU with QEMU by specifying pmu=off in the -cpu argument, among other things. > If it is not possible to do this with qemu, then we would have to wait > until kvm-unit-tests can use kvmtool as the test runner. I have an RFC sent > for that [1], I need to get back to it. > > Another option would be to have this as a kselftest, although I don't know > how easy it is to register an exception handler in a kselftest. The test > could be further expanded to other registers gated by a VCPU feature being > set. We definitely have the plumbing for exception handlers in selftests, aarch64/debug-exceptions.c is an example. My thought was more general + rather lazy. For any combination of CPU features, expect that kvm-unit-tests should either pass or skip. If they fail or blow up the host then probably a good indicator of a KVM bug. > [1] https://lore.kernel.org/kvmarm/20210702163122.96110-1-alexandru.elisei@arm.com/ Thanks for the link, I'll have a peek. > > > > > Work around the fact that KVM advertises a PMU even when the VCPU feature > > > is not set by gating all PMU emulation on the feature. The guest can still > > > access the registers without KVM injecting an undefined exception. > > > > We're going to need something similar even after KVM conditionally > > advertises the PMU. > > > > WDYT about wiring up sys_reg_desc::visibility for the AArch32 PMU > > registers? For now just treat them as REG_RAZ (probably extend this to > > imply WI too) then promote to REG_HIDDEN in a later patch. > > I was thinking you can simply use .visibility = pmu_visibility, like it's > done with the PMU_SYS_REG macro: Right -- I completely agree this is where we should be when AArch32 feature registers are trapped. Seems to me all the AArch32 feature register trap logic should come later on as there's a nonzero chance I introduced a bug :) Shall we stop the bleeding w/ your originally proposed patch? Doesn't seem any more objectionable than what we're already doing. [...] > I've renamed AA32_ZEROHIGH -> AA32_DIRECT. Feel free to use the snippet as > you see fit (or not at all). To avoid shamelessly plagiarizing: may I package up what you have below as a commit coming from you? -- Thanks, Oliver
Hi, On Wed, Apr 27, 2022 at 12:57:57AM -0700, Oliver Upton wrote: > Hi Alex, > > On Tue, Apr 26, 2022 at 2:01 AM Alexandru Elisei > <alexandru.elisei@arm.com> wrote: > > > > The odroid-c4 is somewhat of a special case, because Linux doesn't probe > > > > the PMU. But the above errors can easily be reproduced on any hardware, > > > > with or without a PMU driver, as long as userspace doesn't set the PMU > > > > feature. > > > > > > This note has me wondering if we could do more negative testing with > > > kvm-unit-tests just by selectively turning on/off features, with the > > > expectation that tests either skip or pass. > > > > I'm not sure that that can be accomplished right now. kvm-unit-tests > > supports only qemu as an automated test runner, and qemu enables the PMU by > > default. I don't know if it can be disabled, it would be nice if it could. > > I stumbled upon this by mistake, when I ran kvmtool without enabling the > > PMU (the default in kvmtool is to not have it enabled). > > > > If it is possible to disable PMU emulation from the qemu's command line, > > then it should be as simple as writing a test that expects all PMU register > > accesses to trigger an undefined exception (and adding a new test > > definition). > > You can disable the PMU with QEMU by specifying pmu=off in the -cpu > argument, among other things. Thanks, didn't know that. I'll keep that in mind for testing. > > > If it is not possible to do this with qemu, then we would have to wait > > until kvm-unit-tests can use kvmtool as the test runner. I have an RFC sent > > for that [1], I need to get back to it. > > > > Another option would be to have this as a kselftest, although I don't know > > how easy it is to register an exception handler in a kselftest. The test > > could be further expanded to other registers gated by a VCPU feature being > > set. > > We definitely have the plumbing for exception handlers in selftests, > aarch64/debug-exceptions.c is an example. My thought was more general > + rather lazy. For any combination of CPU features, expect that > kvm-unit-tests should either pass or skip. If they fail or blow up the > host then probably a good indicator of a KVM bug. > > > [1] https://lore.kernel.org/kvmarm/20210702163122.96110-1-alexandru.elisei@arm.com/ > > Thanks for the link, I'll have a peek. Please keep in mind that while it works, it still needs significant changes before it is merged. > > > > > > > > Work around the fact that KVM advertises a PMU even when the VCPU feature > > > > is not set by gating all PMU emulation on the feature. The guest can still > > > > access the registers without KVM injecting an undefined exception. > > > > > > We're going to need something similar even after KVM conditionally > > > advertises the PMU. > > > > > > WDYT about wiring up sys_reg_desc::visibility for the AArch32 PMU > > > registers? For now just treat them as REG_RAZ (probably extend this to > > > imply WI too) then promote to REG_HIDDEN in a later patch. > > > > I was thinking you can simply use .visibility = pmu_visibility, like it's > > done with the PMU_SYS_REG macro: > > Right -- I completely agree this is where we should be when AArch32 > feature registers are trapped. > > Seems to me all the AArch32 feature register trap logic should come > later on as there's a nonzero chance I introduced a bug :) Shall we > stop the bleeding w/ your originally proposed patch? Doesn't seem any > more objectionable than what we're already doing. I am leaning towards merging this patch to prevent people from seeing the splat, and when the AArch32 ID reg series gets merged it can be reverted. But in the end it's up to Marc to decide what he prefers. > > [...] > > > I've renamed AA32_ZEROHIGH -> AA32_DIRECT. Feel free to use the snippet as > > you see fit (or not at all). > > To avoid shamelessly plagiarizing: may I package up what you have > below as a commit coming from you? Sure, no problem. Here's the patch, it's earlier attempt to fix NULL pointer dereference, feel free to change change it as you see fit: KVM/arm64: Hide AArch32 PMU registers when not available Commit 11663111cd49 ("KVM: arm64: Hide PMU registers from userspace when not available") hid the AArch64 PMU registers from userspace and guest when the PMU VCPU feature was not set. Do the same when the PMU registers are accessed by an AArch32 guest. This also fixes this nasty splat which was observed on an odroid-c4, by running the kvm-unit-tests' pmu-cycle-counter test with kvmtool without an emulated PMU: [ 44.993727] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000150 [ 44.996897] Mem abort info: [ 44.999654] ESR = 0x96000004 [ 45.002696] EC = 0x25: DABT (current EL), IL = 32 bits [ 45.007944] SET = 0, FnV = 0 [ 45.010956] EA = 0, S1PTW = 0 [ 45.014056] FSC = 0x04: level 0 translation fault [ 45.018886] Data abort info: [ 45.021722] ISV = 0, ISS = 0x00000004 [ 45.025532] CM = 0, WnR = 0 [ 45.028463] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000004909000 [ 45.034844] [0000000000000150] pgd=0000000000000000, p4d=0000000000000000 [ 45.041574] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 45.047068] Modules linked in: [ 45.050086] CPU: 0 PID: 259 Comm: kvm-vcpu-0 Not tainted 5.18.0-rc3-00001-g5c5591319a5e #107 [ 45.058455] Hardware name: Hardkernel ODROID-C4 (DT) [ 45.063369] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 45.070269] pc : kvm_pmu_event_mask.isra.0+0x14/0x74 [ 45.075184] lr : kvm_pmu_set_counter_event_type+0x2c/0x80 [ 45.080532] sp : ffff80000aa839c0 [ 45.083807] x29: ffff80000aa839c0 x28: ffff000001ce5580 x27: 0000000000000000 [ 45.090882] x26: 0000000000000000 x25: 0000000000000000 x24: ffff00000d608000 [ 45.097955] x23: ffff000001ce5580 x22: ffff00000d5e4000 x21: 0000000000000000 [ 45.105027] x20: 000000000000001f x19: 00000000000003ff x18: 0000000000000000 [ 45.112100] x17: 000000008003fe98 x16: 0000000000000000 x15: 0000000000000000 [ 45.119172] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 45.126245] x11: 000000008003fdbc x10: 0000000000009d20 x9 : 000000000000001b [ 45.133318] x8 : 0000000000000000 x7 : 0000000000000d00 x6 : 0000000000009d00 [ 45.140390] x5 : 0000000000000037 x4 : 0000000000009d00 x3 : 0d09000000000000 [ 45.147463] x2 : 000000000000001f x1 : 0000000000000000 x0 : 0000000000000000 [ 45.154535] Call trace: [ 45.156947] kvm_pmu_event_mask.isra.0+0x14/0x74 [ 45.161521] kvm_pmu_set_counter_event_type+0x2c/0x80 [ 45.166522] access_pmu_evtyper+0x128/0x170 [ 45.170662] perform_access+0x34/0x80 [ 45.174284] kvm_handle_cp_32+0x13c/0x160 [ 45.178252] kvm_handle_cp15_32+0x1c/0x30 [ 45.182219] handle_exit+0x70/0x180 [ 45.185669] kvm_arch_vcpu_ioctl_run+0x1c4/0x5e0 [ 45.190241] kvm_vcpu_ioctl+0x23c/0x940 [ 45.194036] __arm64_sys_ioctl+0xa8/0xf0 [ 45.197917] invoke_syscall+0x48/0x114 [ 45.201626] el0_svc_common.constprop.0+0xd4/0xfc [ 45.206284] do_el0_svc+0x28/0x90 [ 45.209560] el0_svc+0x34/0xb0 [ 45.212579] el0t_64_sync_handler+0xa4/0x130 [ 45.216806] el0t_64_sync+0x18c/0x190 [ 45.220432] Code: a9be7bfd 910003fd f9000bf3 52807ff3 (b9415001) [ 45.226469] ---[ end trace 0000000000000000 ]--- This is caused by the fact the kvm->arch.arm_pmu is NULL when the VCPU feature is not set and kvm_pmu_event_mask() tries to dereference the NULL pointer to get at pmuver. Now that KVM emulates ID_DFR0 and hides the PMU from the guest when the feature is not set, it is safe to inject to inject an undefined exception when the PMU is not present, as that corresponds to the architected behaviour. diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 7b45c040cc27..b0db275f41c5 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -2014,19 +2014,22 @@ static const struct sys_reg_desc cp14_64_regs[] = { { Op1( 0), CRm( 2), .access = trap_raz_wi }, }; +#define CP15_PMU_SYS_REG(_map, _Op1, _CRn, _CRm, _Op2) \ + AA32(_map), \ + Op1(_Op1), CRn(_CRn), CRm(_CRm), Op2(_Op2), \ + .visibility = pmu_visibility + /* Macro to expand the PMEVCNTRn register */ #define PMU_PMEVCNTR(n) \ - /* PMEVCNTRn */ \ - { Op1(0), CRn(0b1110), \ - CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \ - access_pmu_evcntr } + { CP15_PMU_SYS_REG(DIRECT, 0, 0b1110, \ + (0b1000 | (((n) >> 3) & 0x3)), ((n) & 0x7)), \ + .access = access_pmu_evcntr } /* Macro to expand the PMEVTYPERn register */ #define PMU_PMEVTYPER(n) \ - /* PMEVTYPERn */ \ - { Op1(0), CRn(0b1110), \ - CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \ - access_pmu_evtyper } + { CP15_PMU_SYS_REG(DIRECT, 0, 0b1110, \ + (0b1100 | (((n) >> 3) & 0x3)), ((n) & 0x7)), \ + .access = access_pmu_evtyper } /* * Trapped cp15 registers. TTBR0/TTBR1 get a double encoding, @@ -2067,25 +2070,25 @@ static const struct sys_reg_desc cp15_regs[] = { { Op1( 0), CRn( 7), CRm(14), Op2( 2), access_dcsw }, /* PMU */ - { Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr }, - { Op1( 0), CRn( 9), CRm(12), Op2( 1), access_pmcnten }, - { Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmcnten }, - { Op1( 0), CRn( 9), CRm(12), Op2( 3), access_pmovs }, - { Op1( 0), CRn( 9), CRm(12), Op2( 4), access_pmswinc }, - { Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmselr }, - { AA32(LO), Op1( 0), CRn( 9), CRm(12), Op2( 6), access_pmceid }, - { AA32(LO), Op1( 0), CRn( 9), CRm(12), Op2( 7), access_pmceid }, - { Op1( 0), CRn( 9), CRm(13), Op2( 0), access_pmu_evcntr }, - { Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_evtyper }, - { Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_evcntr }, - { Op1( 0), CRn( 9), CRm(14), Op2( 0), access_pmuserenr }, - { Op1( 0), CRn( 9), CRm(14), Op2( 1), access_pminten }, - { Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pminten }, - { Op1( 0), CRn( 9), CRm(14), Op2( 3), access_pmovs }, - { AA32(HI), Op1( 0), CRn( 9), CRm(14), Op2( 4), access_pmceid }, - { AA32(HI), Op1( 0), CRn( 9), CRm(14), Op2( 5), access_pmceid }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 0), .access = access_pmcr }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 1), .access = access_pmcnten }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 2), .access = access_pmcnten }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 3), .access = access_pmovs }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 4), .access = access_pmswinc }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 5), .access = access_pmselr }, + { CP15_PMU_SYS_REG(LO, 0, 9, 12, 6), .access = access_pmceid }, + { CP15_PMU_SYS_REG(LO, 0, 9, 12, 7), .access = access_pmceid }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 13, 0), .access = access_pmu_evcntr }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 13, 1), .access = access_pmu_evtyper }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 13, 2), .access = access_pmu_evcntr }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 0), .access = access_pmuserenr }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 1), .access = access_pminten }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 2), .access = access_pminten }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 3), .access = access_pmovs }, + { CP15_PMU_SYS_REG(HI, 0, 9, 14, 4), .access = access_pmceid }, + { CP15_PMU_SYS_REG(HI, 0, 9, 14, 5), .access = access_pmceid }, /* PMMIR */ - { Op1( 0), CRn( 9), CRm(14), Op2( 6), trap_raz_wi }, + { CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 6), .access = trap_raz_wi }, /* PRRR/MAIR0 */ { AA32(LO), Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, MAIR_EL1 }, @@ -2170,7 +2173,7 @@ static const struct sys_reg_desc cp15_regs[] = { PMU_PMEVTYPER(29), PMU_PMEVTYPER(30), /* PMCCFILTR */ - { Op1(0), CRn(14), CRm(15), Op2(7), access_pmu_evtyper }, + { CP15_PMU_SYS_REG(DIRECT, 0, 14, 15, 7), .access = access_pmu_evtyper }, { Op1(1), CRn( 0), CRm( 0), Op2(0), access_ccsidr }, { Op1(1), CRn( 0), CRm( 0), Op2(1), access_clidr }, @@ -2179,7 +2182,7 @@ static const struct sys_reg_desc cp15_regs[] = { static const struct sys_reg_desc cp15_64_regs[] = { { Op1( 0), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, TTBR0_EL1 }, - { Op1( 0), CRn( 0), CRm( 9), Op2( 0), access_pmu_evcntr }, + { CP15_PMU_SYS_REG(DIRECT, 0, 0, 9, 0), .access = access_pmu_evcntr }, { Op1( 0), CRn( 0), CRm(12), Op2( 0), access_gic_sgi }, /* ICC_SGI1R */ { Op1( 1), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, TTBR1_EL1 }, { Op1( 1), CRn( 0), CRm(12), Op2( 0), access_gic_sgi }, /* ICC_ASGI1R */ Thanks, Alex
On Wed, 27 Apr 2022 10:45:39 +0100, Alexandru Elisei <alexandru.elisei@arm.com> wrote: > > Hi, > > On Wed, Apr 27, 2022 at 12:57:57AM -0700, Oliver Upton wrote: > > Hi Alex, > > > > Seems to me all the AArch32 feature register trap logic should come > > later on as there's a nonzero chance I introduced a bug :) Shall we > > stop the bleeding w/ your originally proposed patch? Doesn't seem any > > more objectionable than what we're already doing. > > I am leaning towards merging this patch to prevent people from seeing the > splat, and when the AArch32 ID reg series gets merged it can be reverted. > But in the end it's up to Marc to decide what he prefers. Yeah, I'd like to plug this ASAP. I'll queue the workaround for 5.18, and we can rework that the proper way for 5.19. Thanks to both for working on it together, much appreciated. M.
On Mon, 25 Apr 2022 15:55:30 +0100, Alexandru Elisei wrote: > kvm->arch.arm_pmu is set when userspace attempts to set the first PMU > attribute. As certain attributes are mandatory, arm_pmu ends up always > being set to a valid arm_pmu, otherwise KVM will refuse to run the VCPU. > However, this only happens if the VCPU has the PMU feature. If the VCPU > doesn't have the feature bit set, kvm->arch.arm_pmu will be left > uninitialized and equal to NULL. > > [...] Applied to fixes, thanks! [1/1] KVM/arm64: Don't emulate a PMU for 32-bit guests if feature not set commit: 8f6379e207e7d834065a080f407a60d67349d961 Cheers, M.
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index 78fdc443adc7..3dc990ac4f44 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -177,6 +177,9 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx) struct kvm_pmu *pmu = &vcpu->arch.pmu; struct kvm_pmc *pmc = &pmu->pmc[select_idx]; + if (!kvm_vcpu_has_pmu(vcpu)) + return 0; + counter = kvm_pmu_get_pair_counter_value(vcpu, pmc); if (kvm_pmu_pmc_is_chained(pmc) && @@ -198,6 +201,9 @@ void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) { u64 reg; + if (!kvm_vcpu_has_pmu(vcpu)) + return; + reg = (select_idx == ARMV8_PMU_CYCLE_IDX) ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx; __vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, select_idx); @@ -322,6 +328,9 @@ void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu, u64 val) struct kvm_pmu *pmu = &vcpu->arch.pmu; struct kvm_pmc *pmc; + if (!kvm_vcpu_has_pmu(vcpu)) + return; + if (!(__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E) || !val) return; @@ -357,7 +366,7 @@ void kvm_pmu_disable_counter_mask(struct kvm_vcpu *vcpu, u64 val) struct kvm_pmu *pmu = &vcpu->arch.pmu; struct kvm_pmc *pmc; - if (!val) + if (!kvm_vcpu_has_pmu(vcpu) || !val) return; for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) { @@ -527,6 +536,9 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val) struct kvm_pmu *pmu = &vcpu->arch.pmu; int i; + if (!kvm_vcpu_has_pmu(vcpu)) + return; + if (!(__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E)) return; @@ -576,6 +588,9 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) { int i; + if (!kvm_vcpu_has_pmu(vcpu)) + return; + if (val & ARMV8_PMU_PMCR_E) { kvm_pmu_enable_counter_mask(vcpu, __vcpu_sys_reg(vcpu, PMCNTENSET_EL0)); @@ -739,6 +754,9 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, { u64 reg, mask; + if (!kvm_vcpu_has_pmu(vcpu)) + return; + mask = ARMV8_PMU_EVTYPE_MASK; mask &= ~ARMV8_PMU_EVTYPE_EVENT; mask |= kvm_pmu_event_mask(vcpu->kvm); @@ -827,6 +845,9 @@ u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1) u64 val, mask = 0; int base, i, nr_events; + if (!kvm_vcpu_has_pmu(vcpu)) + return 0; + if (!pmceid1) { val = read_sysreg(pmceid0_el0); base = 0;
kvm->arch.arm_pmu is set when userspace attempts to set the first PMU attribute. As certain attributes are mandatory, arm_pmu ends up always being set to a valid arm_pmu, otherwise KVM will refuse to run the VCPU. However, this only happens if the VCPU has the PMU feature. If the VCPU doesn't have the feature bit set, kvm->arch.arm_pmu will be left uninitialized and equal to NULL. KVM doesn't do ID register emulation for 32-bit guests and accesses to the PMU registers aren't gated by the pmu_visibility() function. This is done to prevent injecting unexpected undefined exceptions in guests which have detected the presence of a hardware PMU. But even though the VCPU feature is missing, KVM still attempts to emulate certain aspects of the PMU when PMU registers are accessed. This leads to a NULL pointer dereference like this one, which happens on an odroid-c4 board when running the kvm-unit-tests pmu-cycle-counter test with kvmtool and without the PMU feature being set: [ 454.402699] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000150 [ 454.405865] Mem abort info: [ 454.408596] ESR = 0x96000004 [ 454.411638] EC = 0x25: DABT (current EL), IL = 32 bits [ 454.416901] SET = 0, FnV = 0 [ 454.419909] EA = 0, S1PTW = 0 [ 454.423010] FSC = 0x04: level 0 translation fault [ 454.427841] Data abort info: [ 454.430687] ISV = 0, ISS = 0x00000004 [ 454.434484] CM = 0, WnR = 0 [ 454.437404] user pgtable: 4k pages, 48-bit VAs, pgdp=000000000c924000 [ 454.443800] [0000000000000150] pgd=0000000000000000, p4d=0000000000000000 [ 454.450528] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 454.456036] Modules linked in: [ 454.459053] CPU: 1 PID: 267 Comm: kvm-vcpu-0 Not tainted 5.18.0-rc4 #113 [ 454.465697] Hardware name: Hardkernel ODROID-C4 (DT) [ 454.470612] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 454.477512] pc : kvm_pmu_event_mask.isra.0+0x14/0x74 [ 454.482427] lr : kvm_pmu_set_counter_event_type+0x2c/0x80 [ 454.487775] sp : ffff80000a9839c0 [ 454.491050] x29: ffff80000a9839c0 x28: ffff000000a83a00 x27: 0000000000000000 [ 454.498127] x26: 0000000000000000 x25: 0000000000000000 x24: ffff00000a510000 [ 454.505198] x23: ffff000000a83a00 x22: ffff000003b01000 x21: 0000000000000000 [ 454.512271] x20: 000000000000001f x19: 00000000000003ff x18: 0000000000000000 [ 454.519343] x17: 000000008003fe98 x16: 0000000000000000 x15: 0000000000000000 [ 454.526416] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 454.533489] x11: 000000008003fdbc x10: 0000000000009d20 x9 : 000000000000001b [ 454.540561] x8 : 0000000000000000 x7 : 0000000000000d00 x6 : 0000000000009d00 [ 454.547633] x5 : 0000000000000037 x4 : 0000000000009d00 x3 : 0d09000000000000 [ 454.554705] x2 : 000000000000001f x1 : 0000000000000000 x0 : 0000000000000000 [ 454.561779] Call trace: [ 454.564191] kvm_pmu_event_mask.isra.0+0x14/0x74 [ 454.568764] kvm_pmu_set_counter_event_type+0x2c/0x80 [ 454.573766] access_pmu_evtyper+0x128/0x170 [ 454.577905] perform_access+0x34/0x80 [ 454.581527] kvm_handle_cp_32+0x13c/0x160 [ 454.585495] kvm_handle_cp15_32+0x1c/0x30 [ 454.589462] handle_exit+0x70/0x180 [ 454.592912] kvm_arch_vcpu_ioctl_run+0x1c4/0x5e0 [ 454.597485] kvm_vcpu_ioctl+0x23c/0x940 [ 454.601280] __arm64_sys_ioctl+0xa8/0xf0 [ 454.605160] invoke_syscall+0x48/0x114 [ 454.608869] el0_svc_common.constprop.0+0xd4/0xfc [ 454.613527] do_el0_svc+0x28/0x90 [ 454.616803] el0_svc+0x34/0xb0 [ 454.619822] el0t_64_sync_handler+0xa4/0x130 [ 454.624049] el0t_64_sync+0x18c/0x190 [ 454.627675] Code: a9be7bfd 910003fd f9000bf3 52807ff3 (b9415001) [ 454.633714] ---[ end trace 0000000000000000 ]--- In this particular case, Linux hasn't detected the presence of a hardware PMU because the PMU node is missing from the DTB, so userspace would have been unable to set the VCPU PMU feature even if it attempted it. What happens is that the 32-bit guest reads ID_DFR0, which advertises the presence of the PMU, and when it tries to program a counter, it triggers the NULL pointer dereference because kvm->arch.arm_pmu is NULL. kvm-arch.arm_pmu was introduced by commit 46b187821472 ("KVM: arm64: Keep a per-VM pointer to the default PMU"). Until that commit, this error would be triggered instead: [ 73.388140] ------------[ cut here ]------------ [ 73.388189] Unknown PMU version 0 [ 73.390420] WARNING: CPU: 1 PID: 264 at arch/arm64/kvm/pmu-emul.c:36 kvm_pmu_event_mask.isra.0+0x6c/0x74 [ 73.399821] Modules linked in: [ 73.402835] CPU: 1 PID: 264 Comm: kvm-vcpu-0 Not tainted 5.17.0 #114 [ 73.409132] Hardware name: Hardkernel ODROID-C4 (DT) [ 73.414048] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 73.420948] pc : kvm_pmu_event_mask.isra.0+0x6c/0x74 [ 73.425863] lr : kvm_pmu_event_mask.isra.0+0x6c/0x74 [ 73.430779] sp : ffff80000a8db9b0 [ 73.434055] x29: ffff80000a8db9b0 x28: ffff000000dbaac0 x27: 0000000000000000 [ 73.441131] x26: ffff000000dbaac0 x25: 00000000c600000d x24: 0000000000180720 [ 73.448203] x23: ffff800009ffbe10 x22: ffff00000b612000 x21: 0000000000000000 [ 73.455276] x20: 000000000000001f x19: 0000000000000000 x18: ffffffffffffffff [ 73.462348] x17: 000000008003fe98 x16: 0000000000000000 x15: 0720072007200720 [ 73.469420] x14: 0720072007200720 x13: ffff800009d32488 x12: 00000000000004e6 [ 73.476493] x11: 00000000000001a2 x10: ffff800009d32488 x9 : ffff800009d32488 [ 73.483565] x8 : 00000000ffffefff x7 : ffff800009d8a488 x6 : ffff800009d8a488 [ 73.490638] x5 : ffff0000f461a9d8 x4 : 0000000000000000 x3 : 0000000000000001 [ 73.497710] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000000dbaac0 [ 73.504784] Call trace: [ 73.507195] kvm_pmu_event_mask.isra.0+0x6c/0x74 [ 73.511768] kvm_pmu_set_counter_event_type+0x2c/0x80 [ 73.516770] access_pmu_evtyper+0x128/0x16c [ 73.520910] perform_access+0x34/0x80 [ 73.524532] kvm_handle_cp_32+0x13c/0x160 [ 73.528500] kvm_handle_cp15_32+0x1c/0x30 [ 73.532467] handle_exit+0x70/0x180 [ 73.535917] kvm_arch_vcpu_ioctl_run+0x20c/0x6e0 [ 73.540489] kvm_vcpu_ioctl+0x2b8/0x9e0 [ 73.544283] __arm64_sys_ioctl+0xa8/0xf0 [ 73.548165] invoke_syscall+0x48/0x114 [ 73.551874] el0_svc_common.constprop.0+0xd4/0xfc [ 73.556531] do_el0_svc+0x28/0x90 [ 73.559808] el0_svc+0x28/0x80 [ 73.562826] el0t_64_sync_handler+0xa4/0x130 [ 73.567054] el0t_64_sync+0x1a0/0x1a4 [ 73.570676] ---[ end trace 0000000000000000 ]--- [ 73.575382] kvm: pmu event creation failed -2 The root cause remains the same: kvm->arch.pmuver was never set to something sensible because the VCPU feature itself was never set. The odroid-c4 is somewhat of a special case, because Linux doesn't probe the PMU. But the above errors can easily be reproduced on any hardware, with or without a PMU driver, as long as userspace doesn't set the PMU feature. Work around the fact that KVM advertises a PMU even when the VCPU feature is not set by gating all PMU emulation on the feature. The guest can still access the registers without KVM injecting an undefined exception. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> --- This looked to me like the easiest fix. I'm open to suggestions if someone comes up with a better solution. In the long run, I think the best fix would be to do ID register emulation for 32-bit guests, like KVM does today for 64-bit guests. arch/arm64/kvm/pmu-emul.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)