diff mbox series

[5/5] KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high

Message ID 20220407162327.396183-6-alexandru.elisei@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64: Treat ESR_ELx as a 64-bit register | expand

Commit Message

Alexandru Elisei April 7, 2022, 4:23 p.m. UTC
When userspace is debugging a VM, the kvm_debug_exit_arch part of the
kvm_run struct contains arm64 specific debug information: the ESR_EL2
value, encoded in the field "hsr", and the address of the instruction
that caused the exception, encoded in the field "far".

Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately
kvm_debug_exit_arch.hsr cannot be changed to match because that would
change the memory layout of the struct on big endian machines:

Current layout:			| Layout with "hsr" extended to 64 bits:
				|
offset 0: ESR_EL2[31:0] (hsr)   | offset 0: ESR_EL2[61:32] (hsr[61:32])
offset 4: padding		| offset 4: ESR_EL2[31:0]  (hsr[31:0])
offset 8: FAR_EL2[61:0] (far)	| offset 8: FAR_EL2[61:0]  (far)

which breaks existing code.

The padding is inserted by the compiler because the "far" field must be
aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1],
page 18) and the struct itself must be aligned to 8 bytes (the struct must
be aligned to the maximum alignment of its fields - aapcs64, page 18),
which means that "hsr" must be aligned to 8 bytes as it is the first field
in the struct.

To avoid changing the struct size and layout for the existing fields, add a
new field, "hsr_high", which replaces the existing padding. "hsr_high" will
be used to hold the ESR_EL2[61:32] bits of the register. The memory layout,
both on big and little endian machine, becomes:

Layout with "hsr_high" added:

offset 0: ESR_EL2[31:0]  (hsr)
offset 4: ESR_EL2[61:32] (hsr_high)
offset 8: FAR_EL2[61:0]  (far)

[1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
I'm happy to drop the patch if this change is premature; I made the change
now, instead of waiting until there's a use case for the upper 32 bits, so
userspace has had the time to pick up the updated struct definition by the
time KVM will start making use of the new field.

 arch/arm64/include/uapi/asm/kvm.h | 1 +
 arch/arm64/kvm/handle_exit.c      | 1 +
 2 files changed, 2 insertions(+)

Comments

Marc Zyngier April 8, 2022, 7:47 a.m. UTC | #1
Hi Alex,

On Thu, 07 Apr 2022 17:23:27 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> When userspace is debugging a VM, the kvm_debug_exit_arch part of the
> kvm_run struct contains arm64 specific debug information: the ESR_EL2
> value, encoded in the field "hsr", and the address of the instruction
> that caused the exception, encoded in the field "far".
> 
> Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately
> kvm_debug_exit_arch.hsr cannot be changed to match because that would
> change the memory layout of the struct on big endian machines:
> 
> Current layout:			| Layout with "hsr" extended to 64 bits:
> 				|
> offset 0: ESR_EL2[31:0] (hsr)   | offset 0: ESR_EL2[61:32] (hsr[61:32])
> offset 4: padding		| offset 4: ESR_EL2[31:0]  (hsr[31:0])
> offset 8: FAR_EL2[61:0] (far)	| offset 8: FAR_EL2[61:0]  (far)
> 
> which breaks existing code.
> 
> The padding is inserted by the compiler because the "far" field must be
> aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1],
> page 18) and the struct itself must be aligned to 8 bytes (the struct must
> be aligned to the maximum alignment of its fields - aapcs64, page 18),
> which means that "hsr" must be aligned to 8 bytes as it is the first field
> in the struct.
> 
> To avoid changing the struct size and layout for the existing fields, add a
> new field, "hsr_high", which replaces the existing padding. "hsr_high" will
> be used to hold the ESR_EL2[61:32] bits of the register. The memory layout,
> both on big and little endian machine, becomes:
> 
> Layout with "hsr_high" added:
> 
> offset 0: ESR_EL2[31:0]  (hsr)
> offset 4: ESR_EL2[61:32] (hsr_high)
> offset 8: FAR_EL2[61:0]  (far)

My concern with this change is that it isn't clear what the padding is
currently initialised to, and I don't think there is any guarantee
that it is zeroed. With that, a new userspace on an old kernel would
interpret hsr_high, and potentially observe stuff that wasn't supposed
to be interpreted.

That's yet another mistake in our userspace ABI (where is the time
machine when you need it?).

In order to do this, we must advertise to userspace that we provide
more information. This probably means adding a flag of some sort to
kvm_run (there are at least 128 bits of x86 stuff that can be readily
reclaimed).

What do you think?

	M.
Alexandru Elisei April 11, 2022, 10:46 a.m. UTC | #2
Hi Marc,

On Fri, Apr 08, 2022 at 08:47:00AM +0100, Marc Zyngier wrote:
> Hi Alex,
> 
> On Thu, 07 Apr 2022 17:23:27 +0100,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > When userspace is debugging a VM, the kvm_debug_exit_arch part of the
> > kvm_run struct contains arm64 specific debug information: the ESR_EL2
> > value, encoded in the field "hsr", and the address of the instruction
> > that caused the exception, encoded in the field "far".
> > 
> > Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately
> > kvm_debug_exit_arch.hsr cannot be changed to match because that would
> > change the memory layout of the struct on big endian machines:
> > 
> > Current layout:			| Layout with "hsr" extended to 64 bits:
> > 				|
> > offset 0: ESR_EL2[31:0] (hsr)   | offset 0: ESR_EL2[61:32] (hsr[61:32])
> > offset 4: padding		| offset 4: ESR_EL2[31:0]  (hsr[31:0])
> > offset 8: FAR_EL2[61:0] (far)	| offset 8: FAR_EL2[61:0]  (far)
> > 
> > which breaks existing code.
> > 
> > The padding is inserted by the compiler because the "far" field must be
> > aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1],
> > page 18) and the struct itself must be aligned to 8 bytes (the struct must
> > be aligned to the maximum alignment of its fields - aapcs64, page 18),
> > which means that "hsr" must be aligned to 8 bytes as it is the first field
> > in the struct.
> > 
> > To avoid changing the struct size and layout for the existing fields, add a
> > new field, "hsr_high", which replaces the existing padding. "hsr_high" will
> > be used to hold the ESR_EL2[61:32] bits of the register. The memory layout,
> > both on big and little endian machine, becomes:
> > 
> > Layout with "hsr_high" added:
> > 
> > offset 0: ESR_EL2[31:0]  (hsr)
> > offset 4: ESR_EL2[61:32] (hsr_high)
> > offset 8: FAR_EL2[61:0]  (far)
> 
> My concern with this change is that it isn't clear what the padding is
> currently initialised to, and I don't think there is any guarantee
> that it is zeroed. With that, a new userspace on an old kernel would
> interpret hsr_high, and potentially observe stuff that wasn't supposed
> to be interpreted.

You are right, I didn't think about this scenario. Did some digging, and
C99 explicitely states that padding is uninitialized (September 7, 2007,
draft, page 38), so I assume that it's the same for C89 (couldn't find a
free source for the standard).

> 
> That's yet another mistake in our userspace ABI (where is the time
> machine when you need it?).

To avoid this sort of thing happening in the future, KVM/arm64 could
mandate that all structs that are part of the userspace API have the
padding explicitly declared as a struct field and that padding must always
be set to zero by userspace before a syscall. KVM would then check that the
padding is zero and return -EINVAL if userspace didn't clear it correctly,
to enforce this convention. I think that should be forward compatible with
repurposing the padding to add another field, unless 0 has a special
meaning for the field that is added, which I believe to be highly unlikely.

> 
> In order to do this, we must advertise to userspace that we provide
> more information. This probably means adding a flag of some sort to
> kvm_run (there are at least 128 bits of x86 stuff that can be readily
> reclaimed).

We could add a flag to kvm_run.flags that is set only when
kvm_run.exit_reason == KVM_EXIT_DEBUG, something like
KVM_DEBUG_ARM_HSR_HIGH_SET (or PRESENT). flags has 16 bits which are unused
today, so I don't think it's too costly to use one bit for this.

One other option would be to wait to expose the upper 32 bits until KVM
supports a hardware feature that makes use of those bits. That means
advertising the feature to userspace, which KVM might or might not want to
do. And KVM today doesn't do any sanitisation/masking on the hsr value that
is reported to userspace, and tying hsr_high to a particular feature might
mean that KVM will have to sanitise the upper bits if said feature is
opt-in by userspace.

My preference is for the first approach because the second approach looks
more complicated.

There's always the option to wait until KVM makes use of the upper 32 bits
and decide then, when we have more information.

Thanks,
Alex

> 
> What do you think?
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
Marc Zyngier April 12, 2022, 11:30 a.m. UTC | #3
On Mon, 11 Apr 2022 11:46:09 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On Fri, Apr 08, 2022 at 08:47:00AM +0100, Marc Zyngier wrote:
> > Hi Alex,
> > 
> > On Thu, 07 Apr 2022 17:23:27 +0100,
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > 
> > > When userspace is debugging a VM, the kvm_debug_exit_arch part of the
> > > kvm_run struct contains arm64 specific debug information: the ESR_EL2
> > > value, encoded in the field "hsr", and the address of the instruction
> > > that caused the exception, encoded in the field "far".
> > > 
> > > Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately
> > > kvm_debug_exit_arch.hsr cannot be changed to match because that would
> > > change the memory layout of the struct on big endian machines:
> > > 
> > > Current layout:			| Layout with "hsr" extended to 64 bits:
> > > 				|
> > > offset 0: ESR_EL2[31:0] (hsr)   | offset 0: ESR_EL2[61:32] (hsr[61:32])
> > > offset 4: padding		| offset 4: ESR_EL2[31:0]  (hsr[31:0])
> > > offset 8: FAR_EL2[61:0] (far)	| offset 8: FAR_EL2[61:0]  (far)
> > > 
> > > which breaks existing code.
> > > 
> > > The padding is inserted by the compiler because the "far" field must be
> > > aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1],
> > > page 18) and the struct itself must be aligned to 8 bytes (the struct must
> > > be aligned to the maximum alignment of its fields - aapcs64, page 18),
> > > which means that "hsr" must be aligned to 8 bytes as it is the first field
> > > in the struct.
> > > 
> > > To avoid changing the struct size and layout for the existing fields, add a
> > > new field, "hsr_high", which replaces the existing padding. "hsr_high" will
> > > be used to hold the ESR_EL2[61:32] bits of the register. The memory layout,
> > > both on big and little endian machine, becomes:
> > > 
> > > Layout with "hsr_high" added:
> > > 
> > > offset 0: ESR_EL2[31:0]  (hsr)
> > > offset 4: ESR_EL2[61:32] (hsr_high)
> > > offset 8: FAR_EL2[61:0]  (far)
> > 
> > My concern with this change is that it isn't clear what the padding is
> > currently initialised to, and I don't think there is any guarantee
> > that it is zeroed. With that, a new userspace on an old kernel would
> > interpret hsr_high, and potentially observe stuff that wasn't supposed
> > to be interpreted.
> 
> You are right, I didn't think about this scenario. Did some digging, and
> C99 explicitely states that padding is uninitialized (September 7, 2007,
> draft, page 38), so I assume that it's the same for C89 (couldn't find a
> free source for the standard).
> 
> > 
> > That's yet another mistake in our userspace ABI (where is the time
> > machine when you need it?).
> 
> To avoid this sort of thing happening in the future, KVM/arm64 could
> mandate that all structs that are part of the userspace API have the
> padding explicitly declared as a struct field and that padding must always
> be set to zero by userspace before a syscall.

That's what we were aiming for for most structures, but clearly missed
some. Debug is obviously an unloved part of the architecture.

> KVM would then check that the
> padding is zero and return -EINVAL if userspace didn't clear it correctly,
> to enforce this convention. I think that should be forward compatible with
> repurposing the padding to add another field, unless 0 has a special
> meaning for the field that is added, which I believe to be highly unlikely.
> 
> > 
> > In order to do this, we must advertise to userspace that we provide
> > more information. This probably means adding a flag of some sort to
> > kvm_run (there are at least 128 bits of x86 stuff that can be readily
> > reclaimed).
> 
> We could add a flag to kvm_run.flags that is set only when
> kvm_run.exit_reason == KVM_EXIT_DEBUG, something like
> KVM_DEBUG_ARM_HSR_HIGH_SET (or PRESENT). flags has 16 bits which are unused
> today, so I don't think it's too costly to use one bit for this.

That'd be a sensible approach. We use any for KVM/arm64 yet, and this
is all arch-specific anyway.

> 
> One other option would be to wait to expose the upper 32 bits until KVM
> supports a hardware feature that makes use of those bits. That means
> advertising the feature to userspace, which KVM might or might not want to
> do. And KVM today doesn't do any sanitisation/masking on the hsr value that
> is reported to userspace, and tying hsr_high to a particular feature might
> mean that KVM will have to sanitise the upper bits if said feature is
> opt-in by userspace.
> 
> My preference is for the first approach because the second approach looks
> more complicated.

+1.

> There's always the option to wait until KVM makes use of the upper 32 bits
> and decide then, when we have more information.

The sooner, the better. Updating userspace takes *years*, so let's do
it ASAP.

Thanks,

	M.
diff mbox series

Patch

diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index c1b6ddc02d2f..f51f9e413cb8 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -141,6 +141,7 @@  struct kvm_guest_debug_arch {
 
 struct kvm_debug_exit_arch {
 	__u32 hsr;
+	__u32 hsr_high;	/* ESR_EL2[61:32] */
 	__u64 far;	/* used for watchpoints */
 };
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 93d92130d36c..2ccf4e6372c2 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -121,6 +121,7 @@  static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu)
 
 	run->exit_reason = KVM_EXIT_DEBUG;
 	run->debug.arch.hsr = lower_32_bits(esr);
+	run->debug.arch.hsr_high = upper_32_bits(esr);
 
 	if (ESR_ELx_EC(esr) == ESR_ELx_EC_WATCHPT_LOW)
 		run->debug.arch.far = vcpu->arch.fault.far_el2;