diff mbox series

[v2] KVM: arm64: Initialize VCPU mdcr_el2 before loading it

Message ID 20210323180057.263356-1-alexandru.elisei@arm.com (mailing list archive)
State New, archived
Headers show
Series [v2] KVM: arm64: Initialize VCPU mdcr_el2 before loading it | expand

Commit Message

Alexandru Elisei March 23, 2021, 6 p.m. UTC
When a VCPU is created, the kvm_vcpu struct is initialized to zero in
kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time
vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is
set to a sensible value in kvm_arm_setup_debug() later in the run loop. The
result is that KVM executes for a short time with MDCR_EL2 set to zero.

This has several unintended consequences:

* Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM
  DDI 0487G.a, page D13-3820. The behavior specified by the architecture
  in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a
  value less than or equal to PMCR_EL0.N, which means that an unknown
  number of counters are now disabled by MDCR_EL2.HPME, which is zero.

* The host configuration for the other debug features controlled by
  MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux
  doesn't use the other fields, but that might change in the future.

Let's avoid both issues by initializing the VCPU's mdcr_el2 field in
kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register
has a consistent value after each vcpu_load().

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
Found by code inspection. Based on v5.12-rc4.

Tested on an odroid-c4 with VHE. vcpu->arch.mdcr_el2 is calculated to be
0x4e66. Without this patch, reading MDCR_EL2 after the first vcpu_load() in
kvm_arch_vcpu_ioctl_run() returns 0; with this patch it returns the correct
value, 0xe66 (FEAT_SPE is not implemented by the PE).

This patch was initially part of the KVM SPE series [1], but those patches
haven't seen much activity, so I thought it would be a good idea to send
this patch separately to draw more attention to it.

Changes in v2:
* Moved kvm_arm_vcpu_init_debug() earlier in kvm_vcpu_first_run_init() so
  vcpu->arch.mdcr_el2 is calculated even if kvm_vgic_map_resources() fails.
* Added comment to kvm_arm_setup_mdcr_el2 to explain what testing
  vcpu->guest_debug means.

[1] https://www.spinics.net/lists/kvm-arm/msg42959.html

 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/arm.c              |  3 +-
 arch/arm64/kvm/debug.c            | 82 +++++++++++++++++++++----------
 3 files changed, 59 insertions(+), 27 deletions(-)

Comments

Marc Zyngier March 30, 2021, 9:55 a.m. UTC | #1
Hi Alex,

On Tue, 23 Mar 2021 18:00:57 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> When a VCPU is created, the kvm_vcpu struct is initialized to zero in
> kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time
> vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is
> set to a sensible value in kvm_arm_setup_debug() later in the run loop. The
> result is that KVM executes for a short time with MDCR_EL2 set to zero.
> 
> This has several unintended consequences:
> 
> * Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM
>   DDI 0487G.a, page D13-3820. The behavior specified by the architecture
>   in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a
>   value less than or equal to PMCR_EL0.N, which means that an unknown
>   number of counters are now disabled by MDCR_EL2.HPME, which is zero.
> 
> * The host configuration for the other debug features controlled by
>   MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux
>   doesn't use the other fields, but that might change in the future.
> 
> Let's avoid both issues by initializing the VCPU's mdcr_el2 field in
> kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register
> has a consistent value after each vcpu_load().
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

This looks strangely similar to 4942dc6638b0 ("KVM: arm64: Write
arch.mdcr_el2 changes since last vcpu_load on VHE"), just at a
different point. Probably worth a Fixes tag.

> ---
> Found by code inspection. Based on v5.12-rc4.
> 
> Tested on an odroid-c4 with VHE. vcpu->arch.mdcr_el2 is calculated to be
> 0x4e66. Without this patch, reading MDCR_EL2 after the first vcpu_load() in
> kvm_arch_vcpu_ioctl_run() returns 0; with this patch it returns the correct
> value, 0xe66 (FEAT_SPE is not implemented by the PE).
> 
> This patch was initially part of the KVM SPE series [1], but those patches
> haven't seen much activity, so I thought it would be a good idea to send
> this patch separately to draw more attention to it.
> 
> Changes in v2:
> * Moved kvm_arm_vcpu_init_debug() earlier in kvm_vcpu_first_run_init() so
>   vcpu->arch.mdcr_el2 is calculated even if kvm_vgic_map_resources() fails.
> * Added comment to kvm_arm_setup_mdcr_el2 to explain what testing
>   vcpu->guest_debug means.
> 
> [1] https://www.spinics.net/lists/kvm-arm/msg42959.html
> 
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  arch/arm64/kvm/arm.c              |  3 +-
>  arch/arm64/kvm/debug.c            | 82 +++++++++++++++++++++----------
>  3 files changed, 59 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 3d10e6527f7d..858c2fcfc043 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -713,6 +713,7 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>  
>  void kvm_arm_init_debug(void);
> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 7f06ba76698d..7088d8fe7186 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -580,6 +580,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
>  
>  	vcpu->arch.has_run_once = true;
>  
> +	kvm_arm_vcpu_init_debug(vcpu);
> +
>  	if (likely(irqchip_in_kernel(kvm))) {
>  		/*
>  		 * Map the VGIC hardware resources before running a vcpu the
> @@ -791,7 +793,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  		}
>  
>  		kvm_arm_setup_debug(vcpu);
> -

Spurious change?

>  		/**************************************************************
>  		 * Enter the guest
>  		 */
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 7a7e425616b5..3626d03354f6 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -68,6 +68,60 @@ void kvm_arm_init_debug(void)
>  	__this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2));
>  }
>  
> +/**
> + * kvm_arm_setup_mdcr_el2 - configure vcpu mdcr_el2 value
> + *
> + * @vcpu:	the vcpu pointer
> + * @host_mdcr:  host mdcr_el2 value
> + *
> + * This ensures we will trap access to:
> + *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
> + *  - Debug ROM Address (MDCR_EL2_TDRA)
> + *  - OS related registers (MDCR_EL2_TDOSA)
> + *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
> + */
> +static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr)
> +{
> +	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
> +
> +	/*
> +	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
> +	 * to the profiling buffer.
> +	 */
> +	vcpu->arch.mdcr_el2 = host_mdcr & MDCR_EL2_HPMN_MASK;
> +	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> +				MDCR_EL2_TPMS |
> +				MDCR_EL2_TPMCR |
> +				MDCR_EL2_TDRA |
> +				MDCR_EL2_TDOSA);
> +
> +	/* Is the VM being debugged by userspace? */
> +	if (vcpu->guest_debug) {
> +		/* Route all software debug exceptions to EL2 */
> +		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
> +		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW)
> +			trap_debug = true;
> +	}
> +
> +	/* Trap debug register access */
> +	if (trap_debug)
> +		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
> +
> +	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
> +}
> +
> +/**
> + * kvm_arm_vcpu_init_debug - setup vcpu debug traps
> + *
> + * @vcpu:	the vcpu pointer
> + *
> + * Set vcpu initial mdcr_el2 value.
> + */
> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
> +{
> +	kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));

Given that kvm_arm_setup_mdcr_el2() always takes the current host
value for mdcr_el2, why not moving the read into it and be done with
it?

Also, do we really need an extra wrapper?

> +}
> +
>  /**
>   * kvm_arm_reset_debug_ptr - reset the debug ptr to point to the vcpu state
>   */
> @@ -83,12 +137,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>   * @vcpu:	the vcpu pointer
>   *
>   * This is called before each entry into the hypervisor to setup any
> - * debug related registers. Currently this just ensures we will trap
> - * access to:
> - *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
> - *  - Debug ROM Address (MDCR_EL2_TDRA)
> - *  - OS related registers (MDCR_EL2_TDOSA)
> - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
> + * debug related registers.
>   *
>   * Additionally, KVM only traps guest accesses to the debug registers if
>   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
> @@ -100,27 +149,14 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>  
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>  {
> -	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
>  	unsigned long mdscr, orig_mdcr_el2 = vcpu->arch.mdcr_el2;
>  
>  	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
>  
> -	/*
> -	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
> -	 * to the profiling buffer.
> -	 */
> -	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
> -	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> -				MDCR_EL2_TPMS |
> -				MDCR_EL2_TPMCR |
> -				MDCR_EL2_TDRA |
> -				MDCR_EL2_TDOSA);
> +	kvm_arm_setup_mdcr_el2(vcpu, __this_cpu_read(mdcr_el2));
>  
>  	/* Is Guest debugging in effect? */
>  	if (vcpu->guest_debug) {
> -		/* Route all software debug exceptions to EL2 */
> -		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
> -
>  		/* Save guest debug state */
>  		save_guest_debug_regs(vcpu);
>  
> @@ -174,7 +210,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>  
>  			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
>  			vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
> -			trap_debug = true;

There is something that slightly worries me here: there is now a
disconnect between flagging debug as dirty and setting the
trapping. And actually, you now check for KVM_ARM64_DEBUG_DIRTY and
set the trap bits *before* setting the dirty bit itself.

Here, I believe you end up with guest/host confusion of breakpoints,
which isn't great. Or did I miss something?

>  
>  			trace_kvm_arm_set_regset("BKPTS", get_num_brps(),
>  						&vcpu->arch.debug_ptr->dbg_bcr[0],
> @@ -189,10 +224,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>  	BUG_ON(!vcpu->guest_debug &&
>  		vcpu->arch.debug_ptr != &vcpu->arch.vcpu_debug_state);
>  
> -	/* Trap debug register access */
> -	if (trap_debug)
> -		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
> -
>  	/* If KDE or MDE are set, perform a full save/restore cycle. */
>  	if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
>  		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
> @@ -201,7 +232,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>  	if (has_vhe() && orig_mdcr_el2 != vcpu->arch.mdcr_el2)
>  		write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
>  
> -	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
>  	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
>  }

Thanks,

	M.
Alexandru Elisei March 30, 2021, 5:13 p.m. UTC | #2
Hi Marc,

Thanks for having a look!

On 3/30/21 10:55 AM, Marc Zyngier wrote:
> Hi Alex,
>
> On Tue, 23 Mar 2021 18:00:57 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>> When a VCPU is created, the kvm_vcpu struct is initialized to zero in
>> kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time
>> vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is
>> set to a sensible value in kvm_arm_setup_debug() later in the run loop. The
>> result is that KVM executes for a short time with MDCR_EL2 set to zero.
>>
>> This has several unintended consequences:
>>
>> * Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM
>>   DDI 0487G.a, page D13-3820. The behavior specified by the architecture
>>   in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a
>>   value less than or equal to PMCR_EL0.N, which means that an unknown
>>   number of counters are now disabled by MDCR_EL2.HPME, which is zero.
>>
>> * The host configuration for the other debug features controlled by
>>   MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux
>>   doesn't use the other fields, but that might change in the future.
>>
>> Let's avoid both issues by initializing the VCPU's mdcr_el2 field in
>> kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register
>> has a consistent value after each vcpu_load().
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> This looks strangely similar to 4942dc6638b0 ("KVM: arm64: Write
> arch.mdcr_el2 changes since last vcpu_load on VHE"), just at a
> different point. Probably worth a Fixes tag.

This bug is present in the commit you are mentioning, and from what I can tell
it's also present in the commit it's fixing (d5a21bcc2995 ("KVM: arm64: Move
common VHE/non-VHE trap config in separate functions")) - vcpu->arch.mdcr_el2 is
computed in kvm_arm_setup_debug(), which is called after vcpu_load(). My guess is
that this bug is from VHE support was added (or soon after).

I can dig further, how far back in time should I aim for?

>
>> ---
>> Found by code inspection. Based on v5.12-rc4.
>>
>> Tested on an odroid-c4 with VHE. vcpu->arch.mdcr_el2 is calculated to be
>> 0x4e66. Without this patch, reading MDCR_EL2 after the first vcpu_load() in
>> kvm_arch_vcpu_ioctl_run() returns 0; with this patch it returns the correct
>> value, 0xe66 (FEAT_SPE is not implemented by the PE).
>>
>> This patch was initially part of the KVM SPE series [1], but those patches
>> haven't seen much activity, so I thought it would be a good idea to send
>> this patch separately to draw more attention to it.
>>
>> Changes in v2:
>> * Moved kvm_arm_vcpu_init_debug() earlier in kvm_vcpu_first_run_init() so
>>   vcpu->arch.mdcr_el2 is calculated even if kvm_vgic_map_resources() fails.
>> * Added comment to kvm_arm_setup_mdcr_el2 to explain what testing
>>   vcpu->guest_debug means.
>>
>> [1] https://www.spinics.net/lists/kvm-arm/msg42959.html
>>
>>  arch/arm64/include/asm/kvm_host.h |  1 +
>>  arch/arm64/kvm/arm.c              |  3 +-
>>  arch/arm64/kvm/debug.c            | 82 +++++++++++++++++++++----------
>>  3 files changed, 59 insertions(+), 27 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 3d10e6527f7d..858c2fcfc043 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -713,6 +713,7 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>>  
>>  void kvm_arm_init_debug(void);
>> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 7f06ba76698d..7088d8fe7186 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -580,6 +580,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
>>  
>>  	vcpu->arch.has_run_once = true;
>>  
>> +	kvm_arm_vcpu_init_debug(vcpu);
>> +
>>  	if (likely(irqchip_in_kernel(kvm))) {
>>  		/*
>>  		 * Map the VGIC hardware resources before running a vcpu the
>> @@ -791,7 +793,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>>  		}
>>  
>>  		kvm_arm_setup_debug(vcpu);
>> -
> Spurious change?

Definitely, thank you for spotting it.

>
>>  		/**************************************************************
>>  		 * Enter the guest
>>  		 */
>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>> index 7a7e425616b5..3626d03354f6 100644
>> --- a/arch/arm64/kvm/debug.c
>> +++ b/arch/arm64/kvm/debug.c
>> @@ -68,6 +68,60 @@ void kvm_arm_init_debug(void)
>>  	__this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2));
>>  }
>>  
>> +/**
>> + * kvm_arm_setup_mdcr_el2 - configure vcpu mdcr_el2 value
>> + *
>> + * @vcpu:	the vcpu pointer
>> + * @host_mdcr:  host mdcr_el2 value
>> + *
>> + * This ensures we will trap access to:
>> + *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
>> + *  - Debug ROM Address (MDCR_EL2_TDRA)
>> + *  - OS related registers (MDCR_EL2_TDOSA)
>> + *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
>> + */
>> +static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr)
>> +{
>> +	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
>> +
>> +	/*
>> +	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
>> +	 * to the profiling buffer.
>> +	 */
>> +	vcpu->arch.mdcr_el2 = host_mdcr & MDCR_EL2_HPMN_MASK;
>> +	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
>> +				MDCR_EL2_TPMS |
>> +				MDCR_EL2_TPMCR |
>> +				MDCR_EL2_TDRA |
>> +				MDCR_EL2_TDOSA);
>> +
>> +	/* Is the VM being debugged by userspace? */
>> +	if (vcpu->guest_debug) {
>> +		/* Route all software debug exceptions to EL2 */
>> +		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
>> +		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW)
>> +			trap_debug = true;
>> +	}
>> +
>> +	/* Trap debug register access */
>> +	if (trap_debug)
>> +		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
>> +
>> +	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
>> +}
>> +
>> +/**
>> + * kvm_arm_vcpu_init_debug - setup vcpu debug traps
>> + *
>> + * @vcpu:	the vcpu pointer
>> + *
>> + * Set vcpu initial mdcr_el2 value.
>> + */
>> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
>> +{
>> +	kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));
> Given that kvm_arm_setup_mdcr_el2() always takes the current host
> value for mdcr_el2, why not moving the read into it and be done with
> it?

kvm_arm_setup_debug() is called with preemption disabled, and it can use
__this_cpu_read(). kvm_arm_vcpu_init_debug() is called with preemption enabled, so
it must use this_cpu_read(). I wanted to make the distinction because
kvm_arm_setup_debug() is in the run loop.

>
> Also, do we really need an extra wrapper?

I can remove the wrapper and have kvm_arm_setup_mdcr_el2() use this_cpu_read() for
the host's mdcr_el2 value at the cost of a preempt disable/enable in the run loop
when preemption is disabled. If you think that would make the code easier to
follow, I can certainly do that.

>
>> +}
>> +
>>  /**
>>   * kvm_arm_reset_debug_ptr - reset the debug ptr to point to the vcpu state
>>   */
>> @@ -83,12 +137,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>>   * @vcpu:	the vcpu pointer
>>   *
>>   * This is called before each entry into the hypervisor to setup any
>> - * debug related registers. Currently this just ensures we will trap
>> - * access to:
>> - *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
>> - *  - Debug ROM Address (MDCR_EL2_TDRA)
>> - *  - OS related registers (MDCR_EL2_TDOSA)
>> - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
>> + * debug related registers.
>>   *
>>   * Additionally, KVM only traps guest accesses to the debug registers if
>>   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
>> @@ -100,27 +149,14 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>>  
>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>>  {
>> -	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
>>  	unsigned long mdscr, orig_mdcr_el2 = vcpu->arch.mdcr_el2;
>>  
>>  	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
>>  
>> -	/*
>> -	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
>> -	 * to the profiling buffer.
>> -	 */
>> -	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
>> -	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
>> -				MDCR_EL2_TPMS |
>> -				MDCR_EL2_TPMCR |
>> -				MDCR_EL2_TDRA |
>> -				MDCR_EL2_TDOSA);
>> +	kvm_arm_setup_mdcr_el2(vcpu, __this_cpu_read(mdcr_el2));
>>  
>>  	/* Is Guest debugging in effect? */
>>  	if (vcpu->guest_debug) {
>> -		/* Route all software debug exceptions to EL2 */
>> -		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
>> -
>>  		/* Save guest debug state */
>>  		save_guest_debug_regs(vcpu);
>>  
>> @@ -174,7 +210,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>>  
>>  			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
>>  			vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>> -			trap_debug = true;
> There is something that slightly worries me here: there is now a
> disconnect between flagging debug as dirty and setting the
> trapping. And actually, you now check for KVM_ARM64_DEBUG_DIRTY and
> set the trap bits *before* setting the dirty bit itself.
>
> Here, I believe you end up with guest/host confusion of breakpoints,
> which isn't great. Or did I miss something?

I'm sorry, but I don't understand what you mean. This is my understanding of what
is happening.

Without this patch, trap_debug is set to true and the KVM_ARM64_DEBUG_DIRTY flag
is set if vcpu->guest_debug & KVM_GUESTDBG_USE_HW. Further down, trap debug is
only used when computing mdcr_el2.

With this patch, trap_debug is set to true if vcpu->guest_debug &
KVM_GUESTDBG_USE_HW and it's also used for computing mdcr_el2, but this happens in
kvm_arm_setup_mdcr_el2(), which is called at the start of kvm_arm_setup_debug().
The KVM_ARM_DEBUG_DIRTY flags is still set in kvm_arm_setup_debug() if
vcpu->guest_debug & KVM_GUESTDBG_USE_HW, like before.

The guest never runs with the value computed in kvm_vcpu_first_run_init() unless
it's identical with the value recomputed in kvm_arm_setup_debug().

The only difference I see is that mdcr_el2 is computed at the start of
kvm_arm_setup_debug(). I get the feeling I'm also missing something.

Thanks,

Alex

>
>>  
>>  			trace_kvm_arm_set_regset("BKPTS", get_num_brps(),
>>  						&vcpu->arch.debug_ptr->dbg_bcr[0],
>> @@ -189,10 +224,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>>  	BUG_ON(!vcpu->guest_debug &&
>>  		vcpu->arch.debug_ptr != &vcpu->arch.vcpu_debug_state);
>>  
>> -	/* Trap debug register access */
>> -	if (trap_debug)
>> -		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
>> -
>>  	/* If KDE or MDE are set, perform a full save/restore cycle. */
>>  	if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
>>  		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>> @@ -201,7 +232,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>>  	if (has_vhe() && orig_mdcr_el2 != vcpu->arch.mdcr_el2)
>>  		write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
>>  
>> -	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
>>  	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
>>  }
> Thanks,
>
> 	M.
>
Alexandru Elisei March 30, 2021, 5:49 p.m. UTC | #3
Hi Marc,

On 3/30/21 6:13 PM, Alexandru Elisei wrote:
> [..]
>>> +}
>>> +
>>>  /**
>>>   * kvm_arm_reset_debug_ptr - reset the debug ptr to point to the vcpu state
>>>   */
>>> @@ -83,12 +137,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>>>   * @vcpu:	the vcpu pointer
>>>   *
>>>   * This is called before each entry into the hypervisor to setup any
>>> - * debug related registers. Currently this just ensures we will trap
>>> - * access to:
>>> - *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
>>> - *  - Debug ROM Address (MDCR_EL2_TDRA)
>>> - *  - OS related registers (MDCR_EL2_TDOSA)
>>> - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
>>> + * debug related registers.
>>>   *
>>>   * Additionally, KVM only traps guest accesses to the debug registers if
>>>   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
>>> @@ -100,27 +149,14 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>>>  
>>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>>>  {
>>> -	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
>>>  	unsigned long mdscr, orig_mdcr_el2 = vcpu->arch.mdcr_el2;
>>>  
>>>  	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
>>>  
>>> -	/*
>>> -	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
>>> -	 * to the profiling buffer.
>>> -	 */
>>> -	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
>>> -	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
>>> -				MDCR_EL2_TPMS |
>>> -				MDCR_EL2_TPMCR |
>>> -				MDCR_EL2_TDRA |
>>> -				MDCR_EL2_TDOSA);
>>> +	kvm_arm_setup_mdcr_el2(vcpu, __this_cpu_read(mdcr_el2));
>>>  
>>>  	/* Is Guest debugging in effect? */
>>>  	if (vcpu->guest_debug) {
>>> -		/* Route all software debug exceptions to EL2 */
>>> -		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
>>> -
>>>  		/* Save guest debug state */
>>>  		save_guest_debug_regs(vcpu);
>>>  
>>> @@ -174,7 +210,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>>>  
>>>  			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
>>>  			vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>>> -			trap_debug = true;
>> There is something that slightly worries me here: there is now a
>> disconnect between flagging debug as dirty and setting the
>> trapping. And actually, you now check for KVM_ARM64_DEBUG_DIRTY and
>> set the trap bits *before* setting the dirty bit itself.
>>
>> Here, I believe you end up with guest/host confusion of breakpoints,
>> which isn't great. Or did I miss something?
> I'm sorry, but I don't understand what you mean. This is my understanding of what
> is happening.
>
> Without this patch, trap_debug is set to true and the KVM_ARM64_DEBUG_DIRTY flag
> is set if vcpu->guest_debug & KVM_GUESTDBG_USE_HW. Further down, trap debug is
> only used when computing mdcr_el2.
>
> With this patch, trap_debug is set to true if vcpu->guest_debug &
> KVM_GUESTDBG_USE_HW and it's also used for computing mdcr_el2, but this happens in
> kvm_arm_setup_mdcr_el2(), which is called at the start of kvm_arm_setup_debug().
> The KVM_ARM_DEBUG_DIRTY flags is still set in kvm_arm_setup_debug() if
> vcpu->guest_debug & KVM_GUESTDBG_USE_HW, like before.
>
> The guest never runs with the value computed in kvm_vcpu_first_run_init() unless
> it's identical with the value recomputed in kvm_arm_setup_debug().
>
> The only difference I see is that mdcr_el2 is computed at the start of
> kvm_arm_setup_debug(). I get the feeling I'm also missing something.

I think I understand what you mean, you are worried that we won't set the bit in
mdcr_el2 to trap debug in the same place where we set the debug dirty flag. If
that's the case, then I can move kvm_arm_setup_mdcr_el2 right after the BUG_ON()
and remove the KVM_GUESTDBG_USE_HW check because the KVM_ARM_DEBUG_DIRTY would be
already set.

Question though, if mdcr_el2 is tied to the debug dirty flag, we ignore the flag
here (code without this patch):

    BUG_ON(!vcpu->guest_debug &&
        vcpu->arch.debug_ptr != &vcpu->arch.vcpu_debug_state);

    /* Trap debug register access */
    if (trap_debug)
        vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;

    /* If KDE or MDE are set, perform a full save/restore cycle. */
    if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
        vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;

I suppose there's something I don't understand yet about how this is supposed to work.

Thanks,

Alex

>
> Thanks,
>
> Alex
>
>>>  
>>>  			trace_kvm_arm_set_regset("BKPTS", get_num_brps(),
>>>  						&vcpu->arch.debug_ptr->dbg_bcr[0],
>>> @@ -189,10 +224,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>>>  	BUG_ON(!vcpu->guest_debug &&
>>>  		vcpu->arch.debug_ptr != &vcpu->arch.vcpu_debug_state);
>>>  
>>> -	/* Trap debug register access */
>>> -	if (trap_debug)
>>> -		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
>>> -
>>>  	/* If KDE or MDE are set, perform a full save/restore cycle. */
>>>  	if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
>>>  		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>>> @@ -201,7 +232,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>>>  	if (has_vhe() && orig_mdcr_el2 != vcpu->arch.mdcr_el2)
>>>  		write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
>>>  
>>> -	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
>>>  	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
>>>  }
>> Thanks,
>>
>> 	M.
>>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Marc Zyngier March 30, 2021, 7:57 p.m. UTC | #4
On Tue, 30 Mar 2021 18:49:54 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On 3/30/21 6:13 PM, Alexandru Elisei wrote:
> > [..]
> >>> +}
> >>> +
> >>>  /**
> >>>   * kvm_arm_reset_debug_ptr - reset the debug ptr to point to the vcpu state
> >>>   */
> >>> @@ -83,12 +137,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
> >>>   * @vcpu:	the vcpu pointer
> >>>   *
> >>>   * This is called before each entry into the hypervisor to setup any
> >>> - * debug related registers. Currently this just ensures we will trap
> >>> - * access to:
> >>> - *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
> >>> - *  - Debug ROM Address (MDCR_EL2_TDRA)
> >>> - *  - OS related registers (MDCR_EL2_TDOSA)
> >>> - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
> >>> + * debug related registers.
> >>>   *
> >>>   * Additionally, KVM only traps guest accesses to the debug registers if
> >>>   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
> >>> @@ -100,27 +149,14 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
> >>>  
> >>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
> >>>  {
> >>> -	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
> >>>  	unsigned long mdscr, orig_mdcr_el2 = vcpu->arch.mdcr_el2;
> >>>  
> >>>  	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
> >>>  
> >>> -	/*
> >>> -	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
> >>> -	 * to the profiling buffer.
> >>> -	 */
> >>> -	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
> >>> -	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> >>> -				MDCR_EL2_TPMS |
> >>> -				MDCR_EL2_TPMCR |
> >>> -				MDCR_EL2_TDRA |
> >>> -				MDCR_EL2_TDOSA);
> >>> +	kvm_arm_setup_mdcr_el2(vcpu, __this_cpu_read(mdcr_el2));
> >>>  
> >>>  	/* Is Guest debugging in effect? */
> >>>  	if (vcpu->guest_debug) {
> >>> -		/* Route all software debug exceptions to EL2 */
> >>> -		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
> >>> -
> >>>  		/* Save guest debug state */
> >>>  		save_guest_debug_regs(vcpu);
> >>>  
> >>> @@ -174,7 +210,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
> >>>  
> >>>  			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
> >>>  			vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
> >>> -			trap_debug = true;
> >> There is something that slightly worries me here: there is now a
> >> disconnect between flagging debug as dirty and setting the
> >> trapping. And actually, you now check for KVM_ARM64_DEBUG_DIRTY and
> >> set the trap bits *before* setting the dirty bit itself.
> >>
> >> Here, I believe you end up with guest/host confusion of breakpoints,
> >> which isn't great. Or did I miss something?
> > I'm sorry, but I don't understand what you mean. This is my understanding of what
> > is happening.
> >
> > Without this patch, trap_debug is set to true and the KVM_ARM64_DEBUG_DIRTY flag
> > is set if vcpu->guest_debug & KVM_GUESTDBG_USE_HW. Further down, trap debug is
> > only used when computing mdcr_el2.
> >
> > With this patch, trap_debug is set to true if vcpu->guest_debug &
> > KVM_GUESTDBG_USE_HW and it's also used for computing mdcr_el2, but this happens in
> > kvm_arm_setup_mdcr_el2(), which is called at the start of kvm_arm_setup_debug().
> > The KVM_ARM_DEBUG_DIRTY flags is still set in kvm_arm_setup_debug() if
> > vcpu->guest_debug & KVM_GUESTDBG_USE_HW, like before.
> >
> > The guest never runs with the value computed in kvm_vcpu_first_run_init() unless
> > it's identical with the value recomputed in kvm_arm_setup_debug().
> >
> > The only difference I see is that mdcr_el2 is computed at the start of
> > kvm_arm_setup_debug(). I get the feeling I'm also missing something.
> 
> I think I understand what you mean, you are worried that we won't
> set the bit in mdcr_el2 to trap debug in the same place where we set
> the debug dirty flag.

Yes, that's what I mean. The code is conceptually as such ATM:

	debug_trap = (something based on vcpu->flags);
	if (something else) {
		check stuff;
		vcpu->flags |= stuff;
		debug_trap = true;
	}

	if (debug_trap)
		set trap conditions;

You are turning this into:

	debug_trap = (something based on vcpu->flags);
	if (debug_trap) {
		set trap conditions;
	}
	if (something else) {
		check stuff;
		vcpu->flags |= stuff;
	}

which isn't the same thing. In your case, it probably works because of
KVM_GUESTDBG_USE_HW, but that's really hard to follow, and we have had
so many bugs in the debug code that it really needs to be kept as
stupid as possible.

> If that's the case, then I can move kvm_arm_setup_mdcr_el2 right
> after the BUG_ON() and remove the KVM_GUESTDBG_USE_HW check because
> the KVM_ARM_DEBUG_DIRTY would be already set.

Yes, I think that'd be better.

> Question though, if mdcr_el2 is tied to the debug dirty flag, we
> ignore the flag here (code without this patch):
> 
>     BUG_ON(!vcpu->guest_debug &&
>         vcpu->arch.debug_ptr != &vcpu->arch.vcpu_debug_state);
> 
>     /* Trap debug register access */
>     if (trap_debug)
>         vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
> 
>     /* If KDE or MDE are set, perform a full save/restore cycle. */
>     if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
>         vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
> 
> I suppose there's something I don't understand yet about how this is
> supposed to work.

The idea (IIRC) is that if MDSCR_EL1.KDE or MDSCR_EL1.MDE are set,
that's because the guest is currently debugging, and that we are
better off saying that the debug state is dirty, forcing a
save/restore cycle on entry.

You may want to dig into the git history for more accurate
information...

Hope this helps,

	M.
Marc Zyngier March 30, 2021, 8:07 p.m. UTC | #5
On Tue, 30 Mar 2021 18:13:07 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> Thanks for having a look!
> 
> On 3/30/21 10:55 AM, Marc Zyngier wrote:
> > Hi Alex,
> >
> > On Tue, 23 Mar 2021 18:00:57 +0000,
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> >> When a VCPU is created, the kvm_vcpu struct is initialized to zero in
> >> kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time
> >> vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is
> >> set to a sensible value in kvm_arm_setup_debug() later in the run loop. The
> >> result is that KVM executes for a short time with MDCR_EL2 set to zero.
> >>
> >> This has several unintended consequences:
> >>
> >> * Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM
> >>   DDI 0487G.a, page D13-3820. The behavior specified by the architecture
> >>   in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a
> >>   value less than or equal to PMCR_EL0.N, which means that an unknown
> >>   number of counters are now disabled by MDCR_EL2.HPME, which is zero.
> >>
> >> * The host configuration for the other debug features controlled by
> >>   MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux
> >>   doesn't use the other fields, but that might change in the future.
> >>
> >> Let's avoid both issues by initializing the VCPU's mdcr_el2 field in
> >> kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register
> >> has a consistent value after each vcpu_load().
> >>
> >> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > This looks strangely similar to 4942dc6638b0 ("KVM: arm64: Write
> > arch.mdcr_el2 changes since last vcpu_load on VHE"), just at a
> > different point. Probably worth a Fixes tag.
> 
> This bug is present in the commit you are mentioning, and from what
> I can tell it's also present in the commit it's fixing (d5a21bcc2995
> ("KVM: arm64: Move common VHE/non-VHE trap config in separate
> functions")) - vcpu->arch.mdcr_el2 is computed in
> kvm_arm_setup_debug(), which is called after vcpu_load(). My guess
> is that this bug is from VHE support was added (or soon after).

Right. Can you please add a Fixes: tag for the same commit? At least
that'd be consistent.

> I can dig further, how far back in time should I aim for?
> 
> >
> >> ---
> >> Found by code inspection. Based on v5.12-rc4.
> >>
> >> Tested on an odroid-c4 with VHE. vcpu->arch.mdcr_el2 is calculated to be
> >> 0x4e66. Without this patch, reading MDCR_EL2 after the first vcpu_load() in
> >> kvm_arch_vcpu_ioctl_run() returns 0; with this patch it returns the correct
> >> value, 0xe66 (FEAT_SPE is not implemented by the PE).
> >>
> >> This patch was initially part of the KVM SPE series [1], but those patches
> >> haven't seen much activity, so I thought it would be a good idea to send
> >> this patch separately to draw more attention to it.
> >>
> >> Changes in v2:
> >> * Moved kvm_arm_vcpu_init_debug() earlier in kvm_vcpu_first_run_init() so
> >>   vcpu->arch.mdcr_el2 is calculated even if kvm_vgic_map_resources() fails.
> >> * Added comment to kvm_arm_setup_mdcr_el2 to explain what testing
> >>   vcpu->guest_debug means.
> >>
> >> [1] https://www.spinics.net/lists/kvm-arm/msg42959.html
> >>
> >>  arch/arm64/include/asm/kvm_host.h |  1 +
> >>  arch/arm64/kvm/arm.c              |  3 +-
> >>  arch/arm64/kvm/debug.c            | 82 +++++++++++++++++++++----------
> >>  3 files changed, 59 insertions(+), 27 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index 3d10e6527f7d..858c2fcfc043 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -713,6 +713,7 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> >>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
> >>  
> >>  void kvm_arm_init_debug(void);
> >> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
> >>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
> >>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> >> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >> index 7f06ba76698d..7088d8fe7186 100644
> >> --- a/arch/arm64/kvm/arm.c
> >> +++ b/arch/arm64/kvm/arm.c
> >> @@ -580,6 +580,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
> >>  
> >>  	vcpu->arch.has_run_once = true;
> >>  
> >> +	kvm_arm_vcpu_init_debug(vcpu);
> >> +
> >>  	if (likely(irqchip_in_kernel(kvm))) {
> >>  		/*
> >>  		 * Map the VGIC hardware resources before running a vcpu the
> >> @@ -791,7 +793,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >>  		}
> >>  
> >>  		kvm_arm_setup_debug(vcpu);
> >> -
> > Spurious change?
> 
> Definitely, thank you for spotting it.
> 
> >
> >>  		/**************************************************************
> >>  		 * Enter the guest
> >>  		 */
> >> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> >> index 7a7e425616b5..3626d03354f6 100644
> >> --- a/arch/arm64/kvm/debug.c
> >> +++ b/arch/arm64/kvm/debug.c
> >> @@ -68,6 +68,60 @@ void kvm_arm_init_debug(void)
> >>  	__this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2));
> >>  }
> >>  
> >> +/**
> >> + * kvm_arm_setup_mdcr_el2 - configure vcpu mdcr_el2 value
> >> + *
> >> + * @vcpu:	the vcpu pointer
> >> + * @host_mdcr:  host mdcr_el2 value
> >> + *
> >> + * This ensures we will trap access to:
> >> + *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
> >> + *  - Debug ROM Address (MDCR_EL2_TDRA)
> >> + *  - OS related registers (MDCR_EL2_TDOSA)
> >> + *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
> >> + */
> >> +static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr)
> >> +{
> >> +	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
> >> +
> >> +	/*
> >> +	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
> >> +	 * to the profiling buffer.
> >> +	 */
> >> +	vcpu->arch.mdcr_el2 = host_mdcr & MDCR_EL2_HPMN_MASK;
> >> +	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> >> +				MDCR_EL2_TPMS |
> >> +				MDCR_EL2_TPMCR |
> >> +				MDCR_EL2_TDRA |
> >> +				MDCR_EL2_TDOSA);
> >> +
> >> +	/* Is the VM being debugged by userspace? */
> >> +	if (vcpu->guest_debug) {
> >> +		/* Route all software debug exceptions to EL2 */
> >> +		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
> >> +		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW)
> >> +			trap_debug = true;
> >> +	}
> >> +
> >> +	/* Trap debug register access */
> >> +	if (trap_debug)
> >> +		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
> >> +
> >> +	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
> >> +}
> >> +
> >> +/**
> >> + * kvm_arm_vcpu_init_debug - setup vcpu debug traps
> >> + *
> >> + * @vcpu:	the vcpu pointer
> >> + *
> >> + * Set vcpu initial mdcr_el2 value.
> >> + */
> >> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
> >> +{
> >> +	kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));
> > Given that kvm_arm_setup_mdcr_el2() always takes the current host
> > value for mdcr_el2, why not moving the read into it and be done with
> > it?
> 
> kvm_arm_setup_debug() is called with preemption disabled, and it can
> use __this_cpu_read(). kvm_arm_vcpu_init_debug() is called with
> preemption enabled, so it must use this_cpu_read(). I wanted to make
> the distinction because kvm_arm_setup_debug() is in the run loop.

I think it would be absolutely fine to make the slow path of
kvm_vcpu_first_run_init() run with preempt disabled. This happens so
rarely that that it isn't worth thinking about it.

Please give it a lockdep run though! ;-)

> 
> >
> > Also, do we really need an extra wrapper?
> 
> I can remove the wrapper and have kvm_arm_setup_mdcr_el2() use
> this_cpu_read() for the host's mdcr_el2 value at the cost of a
> preempt disable/enable in the run loop when preemption is
> disabled. If you think that would make the code easier to follow, I
> can certainly do that.

As explained above, I'd rather you keep the __this_cpu_read() and make
kvm_vcpu_first_run_init() preemption safe.

Thanks,

	M.
Alexandru Elisei March 31, 2021, 10:48 a.m. UTC | #6
Hi Marc,

On 3/30/21 8:57 PM, Marc Zyngier wrote:
> On Tue, 30 Mar 2021 18:49:54 +0100,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>> Hi Marc,
>>
>> On 3/30/21 6:13 PM, Alexandru Elisei wrote:
>>> [..]
>>>>> +}
>>>>> +
>>>>>  /**
>>>>>   * kvm_arm_reset_debug_ptr - reset the debug ptr to point to the vcpu state
>>>>>   */
>>>>> @@ -83,12 +137,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>>>>>   * @vcpu:	the vcpu pointer
>>>>>   *
>>>>>   * This is called before each entry into the hypervisor to setup any
>>>>> - * debug related registers. Currently this just ensures we will trap
>>>>> - * access to:
>>>>> - *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
>>>>> - *  - Debug ROM Address (MDCR_EL2_TDRA)
>>>>> - *  - OS related registers (MDCR_EL2_TDOSA)
>>>>> - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
>>>>> + * debug related registers.
>>>>>   *
>>>>>   * Additionally, KVM only traps guest accesses to the debug registers if
>>>>>   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
>>>>> @@ -100,27 +149,14 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>>>>>  
>>>>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>>>>>  {
>>>>> -	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
>>>>>  	unsigned long mdscr, orig_mdcr_el2 = vcpu->arch.mdcr_el2;
>>>>>  
>>>>>  	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
>>>>>  
>>>>> -	/*
>>>>> -	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
>>>>> -	 * to the profiling buffer.
>>>>> -	 */
>>>>> -	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
>>>>> -	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
>>>>> -				MDCR_EL2_TPMS |
>>>>> -				MDCR_EL2_TPMCR |
>>>>> -				MDCR_EL2_TDRA |
>>>>> -				MDCR_EL2_TDOSA);
>>>>> +	kvm_arm_setup_mdcr_el2(vcpu, __this_cpu_read(mdcr_el2));
>>>>>  
>>>>>  	/* Is Guest debugging in effect? */
>>>>>  	if (vcpu->guest_debug) {
>>>>> -		/* Route all software debug exceptions to EL2 */
>>>>> -		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
>>>>> -
>>>>>  		/* Save guest debug state */
>>>>>  		save_guest_debug_regs(vcpu);
>>>>>  
>>>>> @@ -174,7 +210,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>>>>>  
>>>>>  			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
>>>>>  			vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>>>>> -			trap_debug = true;
>>>> There is something that slightly worries me here: there is now a
>>>> disconnect between flagging debug as dirty and setting the
>>>> trapping. And actually, you now check for KVM_ARM64_DEBUG_DIRTY and
>>>> set the trap bits *before* setting the dirty bit itself.
>>>>
>>>> Here, I believe you end up with guest/host confusion of breakpoints,
>>>> which isn't great. Or did I miss something?
>>> I'm sorry, but I don't understand what you mean. This is my understanding of what
>>> is happening.
>>>
>>> Without this patch, trap_debug is set to true and the KVM_ARM64_DEBUG_DIRTY flag
>>> is set if vcpu->guest_debug & KVM_GUESTDBG_USE_HW. Further down, trap debug is
>>> only used when computing mdcr_el2.
>>>
>>> With this patch, trap_debug is set to true if vcpu->guest_debug &
>>> KVM_GUESTDBG_USE_HW and it's also used for computing mdcr_el2, but this happens in
>>> kvm_arm_setup_mdcr_el2(), which is called at the start of kvm_arm_setup_debug().
>>> The KVM_ARM_DEBUG_DIRTY flags is still set in kvm_arm_setup_debug() if
>>> vcpu->guest_debug & KVM_GUESTDBG_USE_HW, like before.
>>>
>>> The guest never runs with the value computed in kvm_vcpu_first_run_init() unless
>>> it's identical with the value recomputed in kvm_arm_setup_debug().
>>>
>>> The only difference I see is that mdcr_el2 is computed at the start of
>>> kvm_arm_setup_debug(). I get the feeling I'm also missing something.
>> I think I understand what you mean, you are worried that we won't
>> set the bit in mdcr_el2 to trap debug in the same place where we set
>> the debug dirty flag.
> Yes, that's what I mean. The code is conceptually as such ATM:
>
> 	debug_trap = (something based on vcpu->flags);
> 	if (something else) {
> 		check stuff;
> 		vcpu->flags |= stuff;
> 		debug_trap = true;
> 	}
>
> 	if (debug_trap)
> 		set trap conditions;
>
> You are turning this into:
>
> 	debug_trap = (something based on vcpu->flags);
> 	if (debug_trap) {
> 		set trap conditions;
> 	}
> 	if (something else) {
> 		check stuff;
> 		vcpu->flags |= stuff;
> 	}
>
> which isn't the same thing. In your case, it probably works because of
> KVM_GUESTDBG_USE_HW, but that's really hard to follow, and we have had
> so many bugs in the debug code that it really needs to be kept as
> stupid as possible.
>
>> If that's the case, then I can move kvm_arm_setup_mdcr_el2 right
>> after the BUG_ON() and remove the KVM_GUESTDBG_USE_HW check because
>> the KVM_ARM_DEBUG_DIRTY would be already set.
> Yes, I think that'd be better.

Ok, I'll do that then.

>
>> Question though, if mdcr_el2 is tied to the debug dirty flag, we
>> ignore the flag here (code without this patch):
>>
>>     BUG_ON(!vcpu->guest_debug &&
>>         vcpu->arch.debug_ptr != &vcpu->arch.vcpu_debug_state);
>>
>>     /* Trap debug register access */
>>     if (trap_debug)
>>         vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
>>
>>     /* If KDE or MDE are set, perform a full save/restore cycle. */
>>     if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
>>         vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>>
>> I suppose there's something I don't understand yet about how this is
>> supposed to work.
> The idea (IIRC) is that if MDSCR_EL1.KDE or MDSCR_EL1.MDE are set,
> that's because the guest is currently debugging, and that we are
> better off saying that the debug state is dirty, forcing a
> save/restore cycle on entry.

That made things a bit clearer, thanks.

Thanks,

Alex

>
> You may want to dig into the git history for more accurate
> information...
>
> Hope this helps,
>
> 	M.
>
Alexandru Elisei March 31, 2021, 3:25 p.m. UTC | #7
Hi Marc,

On 3/30/21 9:07 PM, Marc Zyngier wrote:
> On Tue, 30 Mar 2021 18:13:07 +0100,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>> Hi Marc,
>>
>> Thanks for having a look!
>>
>> On 3/30/21 10:55 AM, Marc Zyngier wrote:
>>> Hi Alex,
>>>
>>> On Tue, 23 Mar 2021 18:00:57 +0000,
>>> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>>>> When a VCPU is created, the kvm_vcpu struct is initialized to zero in
>>>> kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time
>>>> vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is
>>>> set to a sensible value in kvm_arm_setup_debug() later in the run loop. The
>>>> result is that KVM executes for a short time with MDCR_EL2 set to zero.
>>>>
>>>> This has several unintended consequences:
>>>>
>>>> * Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM
>>>>   DDI 0487G.a, page D13-3820. The behavior specified by the architecture
>>>>   in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a
>>>>   value less than or equal to PMCR_EL0.N, which means that an unknown
>>>>   number of counters are now disabled by MDCR_EL2.HPME, which is zero.
>>>>
>>>> * The host configuration for the other debug features controlled by
>>>>   MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux
>>>>   doesn't use the other fields, but that might change in the future.
>>>>
>>>> Let's avoid both issues by initializing the VCPU's mdcr_el2 field in
>>>> kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register
>>>> has a consistent value after each vcpu_load().
>>>>
>>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>> This looks strangely similar to 4942dc6638b0 ("KVM: arm64: Write
>>> arch.mdcr_el2 changes since last vcpu_load on VHE"), just at a
>>> different point. Probably worth a Fixes tag.
>> This bug is present in the commit you are mentioning, and from what
>> I can tell it's also present in the commit it's fixing (d5a21bcc2995
>> ("KVM: arm64: Move common VHE/non-VHE trap config in separate
>> functions")) - vcpu->arch.mdcr_el2 is computed in
>> kvm_arm_setup_debug(), which is called after vcpu_load(). My guess
>> is that this bug is from VHE support was added (or soon after).
> Right. Can you please add a Fixes: tag for the same commit? At least
> that'd be consistent.

Yes, I'll do that.

>
>> I can dig further, how far back in time should I aim for?
>>
>>>> ---
>>>> Found by code inspection. Based on v5.12-rc4.
>>>>
>>>> Tested on an odroid-c4 with VHE. vcpu->arch.mdcr_el2 is calculated to be
>>>> 0x4e66. Without this patch, reading MDCR_EL2 after the first vcpu_load() in
>>>> kvm_arch_vcpu_ioctl_run() returns 0; with this patch it returns the correct
>>>> value, 0xe66 (FEAT_SPE is not implemented by the PE).
>>>>
>>>> This patch was initially part of the KVM SPE series [1], but those patches
>>>> haven't seen much activity, so I thought it would be a good idea to send
>>>> this patch separately to draw more attention to it.
>>>>
>>>> Changes in v2:
>>>> * Moved kvm_arm_vcpu_init_debug() earlier in kvm_vcpu_first_run_init() so
>>>>   vcpu->arch.mdcr_el2 is calculated even if kvm_vgic_map_resources() fails.
>>>> * Added comment to kvm_arm_setup_mdcr_el2 to explain what testing
>>>>   vcpu->guest_debug means.
>>>>
>>>> [1] https://www.spinics.net/lists/kvm-arm/msg42959.html
>>>>
>>>>  arch/arm64/include/asm/kvm_host.h |  1 +
>>>>  arch/arm64/kvm/arm.c              |  3 +-
>>>>  arch/arm64/kvm/debug.c            | 82 +++++++++++++++++++++----------
>>>>  3 files changed, 59 insertions(+), 27 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>> index 3d10e6527f7d..858c2fcfc043 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -713,6 +713,7 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>>>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>>>>  
>>>>  void kvm_arm_init_debug(void);
>>>> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
>>>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>>>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>>> index 7f06ba76698d..7088d8fe7186 100644
>>>> --- a/arch/arm64/kvm/arm.c
>>>> +++ b/arch/arm64/kvm/arm.c
>>>> @@ -580,6 +580,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
>>>>  
>>>>  	vcpu->arch.has_run_once = true;
>>>>  
>>>> +	kvm_arm_vcpu_init_debug(vcpu);
>>>> +
>>>>  	if (likely(irqchip_in_kernel(kvm))) {
>>>>  		/*
>>>>  		 * Map the VGIC hardware resources before running a vcpu the
>>>> @@ -791,7 +793,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>>>>  		}
>>>>  
>>>>  		kvm_arm_setup_debug(vcpu);
>>>> -
>>> Spurious change?
>> Definitely, thank you for spotting it.
>>
>>>>  		/**************************************************************
>>>>  		 * Enter the guest
>>>>  		 */
>>>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>>>> index 7a7e425616b5..3626d03354f6 100644
>>>> --- a/arch/arm64/kvm/debug.c
>>>> +++ b/arch/arm64/kvm/debug.c
>>>> @@ -68,6 +68,60 @@ void kvm_arm_init_debug(void)
>>>>  	__this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2));
>>>>  }
>>>>  
>>>> +/**
>>>> + * kvm_arm_setup_mdcr_el2 - configure vcpu mdcr_el2 value
>>>> + *
>>>> + * @vcpu:	the vcpu pointer
>>>> + * @host_mdcr:  host mdcr_el2 value
>>>> + *
>>>> + * This ensures we will trap access to:
>>>> + *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
>>>> + *  - Debug ROM Address (MDCR_EL2_TDRA)
>>>> + *  - OS related registers (MDCR_EL2_TDOSA)
>>>> + *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
>>>> + */
>>>> +static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr)
>>>> +{
>>>> +	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
>>>> +
>>>> +	/*
>>>> +	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
>>>> +	 * to the profiling buffer.
>>>> +	 */
>>>> +	vcpu->arch.mdcr_el2 = host_mdcr & MDCR_EL2_HPMN_MASK;
>>>> +	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
>>>> +				MDCR_EL2_TPMS |
>>>> +				MDCR_EL2_TPMCR |
>>>> +				MDCR_EL2_TDRA |
>>>> +				MDCR_EL2_TDOSA);
>>>> +
>>>> +	/* Is the VM being debugged by userspace? */
>>>> +	if (vcpu->guest_debug) {
>>>> +		/* Route all software debug exceptions to EL2 */
>>>> +		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
>>>> +		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW)
>>>> +			trap_debug = true;
>>>> +	}
>>>> +
>>>> +	/* Trap debug register access */
>>>> +	if (trap_debug)
>>>> +		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
>>>> +
>>>> +	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
>>>> +}
>>>> +
>>>> +/**
>>>> + * kvm_arm_vcpu_init_debug - setup vcpu debug traps
>>>> + *
>>>> + * @vcpu:	the vcpu pointer
>>>> + *
>>>> + * Set vcpu initial mdcr_el2 value.
>>>> + */
>>>> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));
>>> Given that kvm_arm_setup_mdcr_el2() always takes the current host
>>> value for mdcr_el2, why not moving the read into it and be done with
>>> it?
>> kvm_arm_setup_debug() is called with preemption disabled, and it can
>> use __this_cpu_read(). kvm_arm_vcpu_init_debug() is called with
>> preemption enabled, so it must use this_cpu_read(). I wanted to make
>> the distinction because kvm_arm_setup_debug() is in the run loop.
> I think it would be absolutely fine to make the slow path of
> kvm_vcpu_first_run_init() run with preempt disabled. This happens so
> rarely that that it isn't worth thinking about it.

It looks to me like it's a bit too heavy-handed to run the entire function
kvm_vcpu_first_run_init() with preemption disabled just for __this_cpu_read() in
kvm_arm_setup_mdcr_el2(). Not because of the performance cost (it's negligible, as
it's called exactly once in the VCPU lifetime), but because it's not obvious why
it is needed.

I tried this:

@@ -580,7 +580,9 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 
        vcpu->arch.has_run_once = true;
 
-       kvm_arm_vcpu_init_debug(vcpu);
+       preempt_disable();
+       kvm_arm_setup_mdcr_el2(vcpu);
+       preempt_enable();
 
        if (likely(irqchip_in_kernel(kvm))) {
                /*

and it still looks a bit off to me because preemption needs to be disabled because
of an implementation detail in kvm_arm_setup_mdcr_el2(), as the function operates
on the VCPU struct and preemption can be enabled for that.

I was thinking something like this:

@@ -119,7 +119,9 @@ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32
host_mdcr)
  */
 void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
 {
-       kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));
+       preempt_disable();
+       kvm_arm_setup_mdcr_el2(vcpu);
+       preempt_enable();
 }
 
 /**

What do you think?

Thanks,

Alex

>
> Please give it a lockdep run though! ;-)
>
>>> Also, do we really need an extra wrapper?
>> I can remove the wrapper and have kvm_arm_setup_mdcr_el2() use
>> this_cpu_read() for the host's mdcr_el2 value at the cost of a
>> preempt disable/enable in the run loop when preemption is
>> disabled. If you think that would make the code easier to follow, I
>> can certainly do that.
> As explained above, I'd rather you keep the __this_cpu_read() and make
> kvm_vcpu_first_run_init() preemption safe.
>
> Thanks,
>
> 	M.
>
Marc Zyngier March 31, 2021, 3:35 p.m. UTC | #8
On Wed, 31 Mar 2021 16:25:46 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On 3/30/21 9:07 PM, Marc Zyngier wrote:

[...]

> > I think it would be absolutely fine to make the slow path of
> > kvm_vcpu_first_run_init() run with preempt disabled. This happens so
> > rarely that that it isn't worth thinking about it.
> 
> It looks to me like it's a bit too heavy-handed to run the entire
> function kvm_vcpu_first_run_init() with preemption disabled just for
> __this_cpu_read() in kvm_arm_setup_mdcr_el2(). Not because of the
> performance cost (it's negligible, as it's called exactly once in
> the VCPU lifetime), but because it's not obvious why it is needed.
> 
> I tried this:
> 
> @@ -580,7 +580,9 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
>  
>         vcpu->arch.has_run_once = true;
>  
> -       kvm_arm_vcpu_init_debug(vcpu);
> +       preempt_disable();
> +       kvm_arm_setup_mdcr_el2(vcpu);
> +       preempt_enable();
>  
>         if (likely(irqchip_in_kernel(kvm))) {
>                 /*
> 
> and it still looks a bit off to me because preemption needs to be
> disabled because of an implementation detail in
> kvm_arm_setup_mdcr_el2(), as the function operates on the VCPU
> struct and preemption can be enabled for that.
> 
> I was thinking something like this:
> 
> @@ -119,7 +119,9 @@ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32
> host_mdcr)
>   */
>  void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
>  {
> -       kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));
> +       preempt_disable();
> +       kvm_arm_setup_mdcr_el2(vcpu);
> +       preempt_enable();
>  }
>  
>  /**
> 
> What do you think?

I'm fine with it either way, so pick you favourite!

Thanks,

	M.
Alexandru Elisei April 1, 2021, 1:55 p.m. UTC | #9
Hi Marc,

On 3/30/21 8:57 PM, Marc Zyngier wrote:
> On Tue, 30 Mar 2021 18:49:54 +0100,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>> Hi Marc,
>>
>> On 3/30/21 6:13 PM, Alexandru Elisei wrote:
>>> [..]
>>>>> +}
>>>>> +
>>>>>  /**
>>>>>   * kvm_arm_reset_debug_ptr - reset the debug ptr to point to the vcpu state
>>>>>   */
>>>>> @@ -83,12 +137,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>>>>>   * @vcpu:	the vcpu pointer
>>>>>   *
>>>>>   * This is called before each entry into the hypervisor to setup any
>>>>> - * debug related registers. Currently this just ensures we will trap
>>>>> - * access to:
>>>>> - *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
>>>>> - *  - Debug ROM Address (MDCR_EL2_TDRA)
>>>>> - *  - OS related registers (MDCR_EL2_TDOSA)
>>>>> - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
>>>>> + * debug related registers.
>>>>>   *
>>>>>   * Additionally, KVM only traps guest accesses to the debug registers if
>>>>>   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
>>>>> @@ -100,27 +149,14 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>>>>>  
>>>>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>>>>>  {
>>>>> -	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
>>>>>  	unsigned long mdscr, orig_mdcr_el2 = vcpu->arch.mdcr_el2;
>>>>>  
>>>>>  	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
>>>>>  
>>>>> -	/*
>>>>> -	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
>>>>> -	 * to the profiling buffer.
>>>>> -	 */
>>>>> -	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
>>>>> -	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
>>>>> -				MDCR_EL2_TPMS |
>>>>> -				MDCR_EL2_TPMCR |
>>>>> -				MDCR_EL2_TDRA |
>>>>> -				MDCR_EL2_TDOSA);
>>>>> +	kvm_arm_setup_mdcr_el2(vcpu, __this_cpu_read(mdcr_el2));
>>>>>  
>>>>>  	/* Is Guest debugging in effect? */
>>>>>  	if (vcpu->guest_debug) {
>>>>> -		/* Route all software debug exceptions to EL2 */
>>>>> -		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
>>>>> -
>>>>>  		/* Save guest debug state */
>>>>>  		save_guest_debug_regs(vcpu);
>>>>>  
>>>>> @@ -174,7 +210,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>>>>>  
>>>>>  			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
>>>>>  			vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>>>>> -			trap_debug = true;
>>>> There is something that slightly worries me here: there is now a
>>>> disconnect between flagging debug as dirty and setting the
>>>> trapping. And actually, you now check for KVM_ARM64_DEBUG_DIRTY and
>>>> set the trap bits *before* setting the dirty bit itself.
>>>>
>>>> Here, I believe you end up with guest/host confusion of breakpoints,
>>>> which isn't great. Or did I miss something?
>>> I'm sorry, but I don't understand what you mean. This is my understanding of what
>>> is happening.
>>>
>>> Without this patch, trap_debug is set to true and the KVM_ARM64_DEBUG_DIRTY flag
>>> is set if vcpu->guest_debug & KVM_GUESTDBG_USE_HW. Further down, trap debug is
>>> only used when computing mdcr_el2.
>>>
>>> With this patch, trap_debug is set to true if vcpu->guest_debug &
>>> KVM_GUESTDBG_USE_HW and it's also used for computing mdcr_el2, but this happens in
>>> kvm_arm_setup_mdcr_el2(), which is called at the start of kvm_arm_setup_debug().
>>> The KVM_ARM_DEBUG_DIRTY flags is still set in kvm_arm_setup_debug() if
>>> vcpu->guest_debug & KVM_GUESTDBG_USE_HW, like before.
>>>
>>> The guest never runs with the value computed in kvm_vcpu_first_run_init() unless
>>> it's identical with the value recomputed in kvm_arm_setup_debug().
>>>
>>> The only difference I see is that mdcr_el2 is computed at the start of
>>> kvm_arm_setup_debug(). I get the feeling I'm also missing something.
>> I think I understand what you mean, you are worried that we won't
>> set the bit in mdcr_el2 to trap debug in the same place where we set
>> the debug dirty flag.
> Yes, that's what I mean. The code is conceptually as such ATM:
>
> 	debug_trap = (something based on vcpu->flags);
> 	if (something else) {
> 		check stuff;
> 		vcpu->flags |= stuff;
> 		debug_trap = true;
> 	}
>
> 	if (debug_trap)
> 		set trap conditions;
>
> You are turning this into:
>
> 	debug_trap = (something based on vcpu->flags);
> 	if (debug_trap) {
> 		set trap conditions;
> 	}
> 	if (something else) {
> 		check stuff;
> 		vcpu->flags |= stuff;
> 	}
>
> which isn't the same thing. In your case, it probably works because of
> KVM_GUESTDBG_USE_HW, but that's really hard to follow, and we have had
> so many bugs in the debug code that it really needs to be kept as
> stupid as possible.
>
>> If that's the case, then I can move kvm_arm_setup_mdcr_el2 right
>> after the BUG_ON() and remove the KVM_GUESTDBG_USE_HW check because
>> the KVM_ARM_DEBUG_DIRTY would be already set.
> Yes, I think that'd be better.
>
Had another go at this, and as I was looking at the code, I realized that
conceptually, trapping debug registers access (MDCR_EL2.TDA) is tied to:

- KVM_ARM64_DEBUG_DIRTY *not* being set (guest is debugging itself and KVM will
world-switch the debug registers).

- KVM_GUESTDBG_USE_HW being set, which also *sets* KVM_ARM64_DEBUG_DIRTY (host is
debugging the guest using hardware breakpoints).

So I cannot set the MDCR_EL2.TDA bit based on KVM_ARM64_DEBUG_DIRTY, because I
would lose one of the two cases. It looks to me that keeping
kvm_arm_setup_mdcr_el2() unchanged and calling it at the start of
kvm_arm_setup_debug() is the way to go here.

Thanks,

Alex
Marc Zyngier April 1, 2021, 3:22 p.m. UTC | #10
On Thu, 01 Apr 2021 14:55:54 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

[...]

> Had another go at this, and as I was looking at the code, I realized that
> conceptually, trapping debug registers access (MDCR_EL2.TDA) is tied to:
> 
> - KVM_ARM64_DEBUG_DIRTY *not* being set (guest is debugging itself and KVM will
> world-switch the debug registers).
> 
> - KVM_GUESTDBG_USE_HW being set, which also *sets* KVM_ARM64_DEBUG_DIRTY (host is
> debugging the guest using hardware breakpoints).
> 
> So I cannot set the MDCR_EL2.TDA bit based on KVM_ARM64_DEBUG_DIRTY,
> because I would lose one of the two cases. It looks to me that
> keeping kvm_arm_setup_mdcr_el2() unchanged and calling it at the
> start of kvm_arm_setup_debug() is the way to go here.

Just post the revised patch, and we'll take it from there.

Thanks,

	M.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3d10e6527f7d..858c2fcfc043 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -713,6 +713,7 @@  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 
 void kvm_arm_init_debug(void);
+void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7f06ba76698d..7088d8fe7186 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -580,6 +580,8 @@  static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.has_run_once = true;
 
+	kvm_arm_vcpu_init_debug(vcpu);
+
 	if (likely(irqchip_in_kernel(kvm))) {
 		/*
 		 * Map the VGIC hardware resources before running a vcpu the
@@ -791,7 +793,6 @@  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		}
 
 		kvm_arm_setup_debug(vcpu);
-
 		/**************************************************************
 		 * Enter the guest
 		 */
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 7a7e425616b5..3626d03354f6 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -68,6 +68,60 @@  void kvm_arm_init_debug(void)
 	__this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2));
 }
 
+/**
+ * kvm_arm_setup_mdcr_el2 - configure vcpu mdcr_el2 value
+ *
+ * @vcpu:	the vcpu pointer
+ * @host_mdcr:  host mdcr_el2 value
+ *
+ * This ensures we will trap access to:
+ *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
+ *  - Debug ROM Address (MDCR_EL2_TDRA)
+ *  - OS related registers (MDCR_EL2_TDOSA)
+ *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
+ */
+static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr)
+{
+	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
+
+	/*
+	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
+	 * to the profiling buffer.
+	 */
+	vcpu->arch.mdcr_el2 = host_mdcr & MDCR_EL2_HPMN_MASK;
+	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
+				MDCR_EL2_TPMS |
+				MDCR_EL2_TPMCR |
+				MDCR_EL2_TDRA |
+				MDCR_EL2_TDOSA);
+
+	/* Is the VM being debugged by userspace? */
+	if (vcpu->guest_debug) {
+		/* Route all software debug exceptions to EL2 */
+		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
+		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW)
+			trap_debug = true;
+	}
+
+	/* Trap debug register access */
+	if (trap_debug)
+		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
+
+	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
+}
+
+/**
+ * kvm_arm_vcpu_init_debug - setup vcpu debug traps
+ *
+ * @vcpu:	the vcpu pointer
+ *
+ * Set vcpu initial mdcr_el2 value.
+ */
+void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
+{
+	kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));
+}
+
 /**
  * kvm_arm_reset_debug_ptr - reset the debug ptr to point to the vcpu state
  */
@@ -83,12 +137,7 @@  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
  * @vcpu:	the vcpu pointer
  *
  * This is called before each entry into the hypervisor to setup any
- * debug related registers. Currently this just ensures we will trap
- * access to:
- *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
- *  - Debug ROM Address (MDCR_EL2_TDRA)
- *  - OS related registers (MDCR_EL2_TDOSA)
- *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
+ * debug related registers.
  *
  * Additionally, KVM only traps guest accesses to the debug registers if
  * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
@@ -100,27 +149,14 @@  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
 
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 {
-	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
 	unsigned long mdscr, orig_mdcr_el2 = vcpu->arch.mdcr_el2;
 
 	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
 
-	/*
-	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
-	 * to the profiling buffer.
-	 */
-	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
-	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
-				MDCR_EL2_TPMS |
-				MDCR_EL2_TPMCR |
-				MDCR_EL2_TDRA |
-				MDCR_EL2_TDOSA);
+	kvm_arm_setup_mdcr_el2(vcpu, __this_cpu_read(mdcr_el2));
 
 	/* Is Guest debugging in effect? */
 	if (vcpu->guest_debug) {
-		/* Route all software debug exceptions to EL2 */
-		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
-
 		/* Save guest debug state */
 		save_guest_debug_regs(vcpu);
 
@@ -174,7 +210,6 @@  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 
 			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
 			vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
-			trap_debug = true;
 
 			trace_kvm_arm_set_regset("BKPTS", get_num_brps(),
 						&vcpu->arch.debug_ptr->dbg_bcr[0],
@@ -189,10 +224,6 @@  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 	BUG_ON(!vcpu->guest_debug &&
 		vcpu->arch.debug_ptr != &vcpu->arch.vcpu_debug_state);
 
-	/* Trap debug register access */
-	if (trap_debug)
-		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
-
 	/* If KDE or MDE are set, perform a full save/restore cycle. */
 	if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
 		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
@@ -201,7 +232,6 @@  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 	if (has_vhe() && orig_mdcr_el2 != vcpu->arch.mdcr_el2)
 		write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
 
-	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
 	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
 }