diff mbox series

SRCU dereference check warning with SEV-ES

Message ID 601f1278-17dd-7124-f328-b865447ca160@amd.com (mailing list archive)
State New, archived
Headers show
Series SRCU dereference check warning with SEV-ES | expand

Commit Message

Tom Lendacky May 5, 2021, 2:01 p.m. UTC
Boris noticed the below warning when running an SEV-ES guest with
CONFIG_PROVE_LOCKING=y.

The SRCU lock is released before invoking the vCPU run op where the SEV-ES
support will unmap the GHCB. Is the proper thing to do here to take the
SRCU lock around the call to kvm_vcpu_unmap() in this case? It does fix
the issue, but I just want to be sure that I shouldn't, instead, be taking
the memslot lock:

Thanks,
Tom 

[   97.455047] =============================
[   97.459150] WARNING: suspicious RCU usage
[   97.463259] 5.11.0 #1 Not tainted
[   97.466674] -----------------------------
[   97.470783] include/linux/kvm_host.h:641 suspicious rcu_dereference_check() usage!
[   97.478479] 
               other info that might help us debug this:

[   97.486606] 
               rcu_scheduler_active = 2, debug_locks = 1
[   97.493246] 1 lock held by qemu-system-x86/3793:
[   97.497967]  #0: ffff88810fe800c8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x6d/0x650 [kvm]
[   97.507059] 
               stack backtrace:
[   97.511515] CPU: 0 PID: 3793 Comm: qemu-system-x86 Not tainted 5.11.0 #1
[   97.518335] Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018
[   97.525849] Call Trace:
[   97.528385]  dump_stack+0x77/0x97
[   97.531832]  kvm_vcpu_gfn_to_memslot+0x168/0x170 [kvm]
[   97.537172]  kvm_vcpu_unmap+0x28/0x60 [kvm]
[   97.541548]  pre_sev_run+0x122/0x250 [kvm_amd]
[   97.546132]  svm_vcpu_run+0x505/0x760 [kvm_amd]
[   97.550806]  kvm_arch_vcpu_ioctl_run+0xe03/0x1c20 [kvm]
[   97.556251]  ? kvm_arch_vcpu_ioctl_run+0xb9/0x1c20 [kvm]
[   97.561780]  ? __lock_acquire+0x38e/0x1c30
[   97.566031]  kvm_vcpu_ioctl+0x222/0x650 [kvm]
[   97.570585]  ? __fget_files+0xe3/0x190
[   97.574459]  ? __fget_files+0x5/0x190
[   97.578254]  __x64_sys_ioctl+0x98/0xd0
[   97.582130]  ? lockdep_hardirqs_on+0x88/0x120
[   97.586625]  do_syscall_64+0x34/0x50
[   97.590331]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   97.595521] RIP: 0033:0x7f728ffeb457
[   97.599222] Code: 00 00 90 48 8b 05 39 7a 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 09 7a 0c 00 f7 d8 64 89 01 48
[   97.618277] RSP: 002b:00007f7287ffe7e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   97.626029] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f728ffeb457
[   97.633315] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015
[   97.640595] RBP: 000055d927433480 R08: 000055d924adb278 R09: 00000000ffffffff
[   97.647877] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[   97.655156] R13: 00007f7290939004 R14: 0000000000000cf8 R15: 0000000000000000

Comments

Paolo Bonzini May 5, 2021, 4:16 p.m. UTC | #1
On 05/05/21 16:01, Tom Lendacky wrote:
> Boris noticed the below warning when running an SEV-ES guest with
> CONFIG_PROVE_LOCKING=y.
> 
> The SRCU lock is released before invoking the vCPU run op where the SEV-ES
> support will unmap the GHCB. Is the proper thing to do here to take the
> SRCU lock around the call to kvm_vcpu_unmap() in this case? It does fix
> the issue, but I just want to be sure that I shouldn't, instead, be taking
> the memslot lock:

I would rather avoid having long-lived maps, as I am working on removing
them from the Intel code.  However, it seems to me that the GHCB is almost
not used after sev_handle_vmgexit returns?

If so, there's no need to keep it mapped until pre_sev_es_run:

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a9d8d6aafdb8..b2226a5e249d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2200,9 +2200,6 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
  
  static void pre_sev_es_run(struct vcpu_svm *svm)
  {
-	if (!svm->ghcb)
-		return;
-
  	if (svm->ghcb_sa_free) {
  		/*
  		 * The scratch area lives outside the GHCB, so there is a
@@ -2220,13 +2217,6 @@ static void pre_sev_es_run(struct vcpu_svm *svm)
  		svm->ghcb_sa = NULL;
  		svm->ghcb_sa_free = false;
  	}
-
-	trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->ghcb);
-
-	sev_es_sync_to_ghcb(svm);
-
-	kvm_vcpu_unmap(&svm->vcpu, &svm->ghcb_map, true);
-	svm->ghcb = NULL;
  }
  
  void pre_sev_run(struct vcpu_svm *svm, int cpu)
@@ -2465,7 +2455,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
  
  	ret = sev_es_validate_vmgexit(svm);
  	if (ret)
-		return ret;
+		goto out_unmap;
  
  	sev_es_sync_from_ghcb(svm);
  	ghcb_set_sw_exit_info_1(ghcb, 0);
@@ -2485,6 +2485,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
  		ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
  		break;
  	case SVM_VMGEXIT_AP_HLT_LOOP:
+		ghcb_set_sw_exit_info_2(svm->ghcb, 1);
  		ret = kvm_emulate_ap_reset_hold(vcpu);
  		break;
  	case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -2531,6 +2521,11 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
  		ret = svm_invoke_exit_handler(vcpu, exit_code);
  	}
  
+	sev_es_sync_to_ghcb(svm);
+
+out_unmap:
+	kvm_vcpu_unmap(&svm->vcpu, &svm->ghcb_map, true);
+	svm->ghcb = NULL;
  	return ret;
  }
  
@@ -2619,21 +2620,4 @@ void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
  
  void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
  {
-	struct vcpu_svm *svm = to_svm(vcpu);
-
-	/* First SIPI: Use the values as initially set by the VMM */
-	if (!svm->received_first_sipi) {
-		svm->received_first_sipi = true;
-		return;
-	}
-
-	/*
-	 * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
-	 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
-	 * non-zero value.
-	 */
-	if (!svm->ghcb)
-		return;
-
-	ghcb_set_sw_exit_info_2(svm->ghcb, 1);
  }

However:

1) I admit I got lost in the maze starting with sev_es_string_io

2) upon an AP reset hold exit, the above patch sets the EXITINFO2 field
before the SIPI was received.  My understanding is that the processor will
not see the value anyway until it resumes execution, and why would other
vCPUs muck with the AP's GHCB.  But I'm not sure if it's okay.

Paolo
Tom Lendacky May 5, 2021, 6:39 p.m. UTC | #2
On 5/5/21 11:16 AM, Paolo Bonzini wrote:
> On 05/05/21 16:01, Tom Lendacky wrote:
>> Boris noticed the below warning when running an SEV-ES guest with
>> CONFIG_PROVE_LOCKING=y.
>>
>> The SRCU lock is released before invoking the vCPU run op where the SEV-ES
>> support will unmap the GHCB. Is the proper thing to do here to take the
>> SRCU lock around the call to kvm_vcpu_unmap() in this case? It does fix
>> the issue, but I just want to be sure that I shouldn't, instead, be taking
>> the memslot lock:
> 
> I would rather avoid having long-lived maps, as I am working on removing
> them from the Intel code.  However, it seems to me that the GHCB is almost
> not used after sev_handle_vmgexit returns?

Except for as you pointed out below, things like MMIO and IO. Anything
that has to exit to userspace to complete will still need the GHCB mapped
when coming back into the kernel if the shared buffer area of the GHCB is
being used.

Btw, what do you consider long lived maps?  Is having a map while context
switching back to userspace considered long lived? The GHCB will
(possibly) only be mapped on VMEXIT (VMGEXIT) and unmapped on the next
VMRUN for the vCPU. An AP reset hold could be a while, though.

> 
> If so, there's no need to keep it mapped until pre_sev_es_run:
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a9d8d6aafdb8..b2226a5e249d 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2200,9 +2200,6 @@ static int sev_es_validate_vmgexit(struct vcpu_svm
> *svm)
>  
>  static void pre_sev_es_run(struct vcpu_svm *svm)
>  {
> -    if (!svm->ghcb)
> -        return;
> -
>      if (svm->ghcb_sa_free) {
>          /*
>           * The scratch area lives outside the GHCB, so there is a
> @@ -2220,13 +2217,6 @@ static void pre_sev_es_run(struct vcpu_svm *svm)
>          svm->ghcb_sa = NULL;
>          svm->ghcb_sa_free = false;
>      }
> -
> -    trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->ghcb);
> -
> -    sev_es_sync_to_ghcb(svm);
> -
> -    kvm_vcpu_unmap(&svm->vcpu, &svm->ghcb_map, true);
> -    svm->ghcb = NULL;
>  }
>  
>  void pre_sev_run(struct vcpu_svm *svm, int cpu)
> @@ -2465,7 +2455,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>  
>      ret = sev_es_validate_vmgexit(svm);
>      if (ret)
> -        return ret;
> +        goto out_unmap;
>  
>      sev_es_sync_from_ghcb(svm);
>      ghcb_set_sw_exit_info_1(ghcb, 0);
> @@ -2485,6 +2485,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>          ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
>          break;
>      case SVM_VMGEXIT_AP_HLT_LOOP:
> +        ghcb_set_sw_exit_info_2(svm->ghcb, 1);
>          ret = kvm_emulate_ap_reset_hold(vcpu);
>          break;
>      case SVM_VMGEXIT_AP_JUMP_TABLE: {
> @@ -2531,6 +2521,11 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>          ret = svm_invoke_exit_handler(vcpu, exit_code);
>      }
>  
> +    sev_es_sync_to_ghcb(svm);
> +
> +out_unmap:
> +    kvm_vcpu_unmap(&svm->vcpu, &svm->ghcb_map, true);
> +    svm->ghcb = NULL;
>      return ret;
>  }
>  
> @@ -2619,21 +2620,4 @@ void sev_es_prepare_guest_switch(struct vcpu_svm
> *svm, unsigned int cpu)
>  
>  void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
>  {
> -    struct vcpu_svm *svm = to_svm(vcpu);
> -
> -    /* First SIPI: Use the values as initially set by the VMM */
> -    if (!svm->received_first_sipi) {
> -        svm->received_first_sipi = true;
> -        return;
> -    }
> -
> -    /*
> -     * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
> -     * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
> -     * non-zero value.
> -     */
> -    if (!svm->ghcb)
> -        return;
> -
> -    ghcb_set_sw_exit_info_2(svm->ghcb, 1);
>  }
> 
> However:
> 
> 1) I admit I got lost in the maze starting with sev_es_string_io

Yeah, kvm_sev_es_string_io bypasses the instruction emulation that would
normally be done to get register values and information about the port I/O
and supplies those directly. But in general, it follows the same exiting
to userspace to complete the port I/O operation.

> 
> 2) upon an AP reset hold exit, the above patch sets the EXITINFO2 field
> before the SIPI was received.  My understanding is that the processor will
> not see the value anyway until it resumes execution, and why would other
> vCPUs muck with the AP's GHCB.  But I'm not sure if it's okay.

As long as the vCPU might not be woken up for any reason other than a
SIPI, you can get a way with this. But if it was to be woken up for some
other reason (an IPI for some reason?), then you wouldn't want the
non-zero value set in the GHCB in advance, because that might cause the
vCPU to exit the HLT loop it is in waiting for the actual SIPI.

Thanks,
Tom

> 
> Paolo
>
Paolo Bonzini May 5, 2021, 6:50 p.m. UTC | #3
On 05/05/21 20:39, Tom Lendacky wrote:
> On 5/5/21 11:16 AM, Paolo Bonzini wrote:
>> On 05/05/21 16:01, Tom Lendacky wrote:
>>> Boris noticed the below warning when running an SEV-ES guest with
>>> CONFIG_PROVE_LOCKING=y.
>>>
>>> The SRCU lock is released before invoking the vCPU run op where the SEV-ES
>>> support will unmap the GHCB. Is the proper thing to do here to take the
>>> SRCU lock around the call to kvm_vcpu_unmap() in this case? It does fix
>>> the issue, but I just want to be sure that I shouldn't, instead, be taking
>>> the memslot lock:
>>
>> I would rather avoid having long-lived maps, as I am working on removing
>> them from the Intel code.  However, it seems to me that the GHCB is almost
>> not used after sev_handle_vmgexit returns?
> 
> Except for as you pointed out below, things like MMIO and IO. Anything
> that has to exit to userspace to complete will still need the GHCB mapped
> when coming back into the kernel if the shared buffer area of the GHCB is
> being used.
> 
> Btw, what do you consider long lived maps?  Is having a map while context
> switching back to userspace considered long lived? The GHCB will
> (possibly) only be mapped on VMEXIT (VMGEXIT) and unmapped on the next
> VMRUN for the vCPU. An AP reset hold could be a while, though.

Anything that cannot be unmapped in the same function that maps it, 
essentially.

>> 2) upon an AP reset hold exit, the above patch sets the EXITINFO2 field
>> before the SIPI was received.  My understanding is that the processor will
>> not see the value anyway until it resumes execution, and why would other
>> vCPUs muck with the AP's GHCB.  But I'm not sure if it's okay.
> 
> As long as the vCPU might not be woken up for any reason other than a
> SIPI, you can get a way with this. But if it was to be woken up for some
> other reason (an IPI for some reason?), then you wouldn't want the
> non-zero value set in the GHCB in advance, because that might cause the
> vCPU to exit the HLT loop it is in waiting for the actual SIPI.

Ok.  Then the best thing to do is to pull sev_es_pre_run to the 
prepare_guest_switch callback.

Paolo
Tom Lendacky May 5, 2021, 7:55 p.m. UTC | #4
On 5/5/21 1:50 PM, Paolo Bonzini wrote:
> On 05/05/21 20:39, Tom Lendacky wrote:
>> On 5/5/21 11:16 AM, Paolo Bonzini wrote:
>>> On 05/05/21 16:01, Tom Lendacky wrote:
>>>> Boris noticed the below warning when running an SEV-ES guest with
>>>> CONFIG_PROVE_LOCKING=y.
>>>>
>>>> The SRCU lock is released before invoking the vCPU run op where the
>>>> SEV-ES
>>>> support will unmap the GHCB. Is the proper thing to do here to take the
>>>> SRCU lock around the call to kvm_vcpu_unmap() in this case? It does fix
>>>> the issue, but I just want to be sure that I shouldn't, instead, be
>>>> taking
>>>> the memslot lock:
>>>
>>> I would rather avoid having long-lived maps, as I am working on removing
>>> them from the Intel code.  However, it seems to me that the GHCB is almost
>>> not used after sev_handle_vmgexit returns?
>>
>> Except for as you pointed out below, things like MMIO and IO. Anything
>> that has to exit to userspace to complete will still need the GHCB mapped
>> when coming back into the kernel if the shared buffer area of the GHCB is
>> being used.
>>
>> Btw, what do you consider long lived maps?  Is having a map while context
>> switching back to userspace considered long lived? The GHCB will
>> (possibly) only be mapped on VMEXIT (VMGEXIT) and unmapped on the next
>> VMRUN for the vCPU. An AP reset hold could be a while, though.
> 
> Anything that cannot be unmapped in the same function that maps it,
> essentially.
> 
>>> 2) upon an AP reset hold exit, the above patch sets the EXITINFO2 field
>>> before the SIPI was received.  My understanding is that the processor will
>>> not see the value anyway until it resumes execution, and why would other
>>> vCPUs muck with the AP's GHCB.  But I'm not sure if it's okay.
>>
>> As long as the vCPU might not be woken up for any reason other than a
>> SIPI, you can get a way with this. But if it was to be woken up for some
>> other reason (an IPI for some reason?), then you wouldn't want the
>> non-zero value set in the GHCB in advance, because that might cause the
>> vCPU to exit the HLT loop it is in waiting for the actual SIPI.
> 
> Ok.  Then the best thing to do is to pull sev_es_pre_run to the
> prepare_guest_switch callback.

A quick test of this failed (VMRUN failure), let me see what is going on
and post back.

Thanks,
Tom

> 
> Paolo
>
Tom Lendacky May 5, 2021, 9:05 p.m. UTC | #5
On 5/5/21 2:55 PM, Tom Lendacky wrote:
> On 5/5/21 1:50 PM, Paolo Bonzini wrote:
>> On 05/05/21 20:39, Tom Lendacky wrote:
>>> On 5/5/21 11:16 AM, Paolo Bonzini wrote:
>>>> On 05/05/21 16:01, Tom Lendacky wrote:
>>>>> Boris noticed the below warning when running an SEV-ES guest with
>>>>> CONFIG_PROVE_LOCKING=y.
>>>>>
>>>>> The SRCU lock is released before invoking the vCPU run op where the
>>>>> SEV-ES
>>>>> support will unmap the GHCB. Is the proper thing to do here to take the
>>>>> SRCU lock around the call to kvm_vcpu_unmap() in this case? It does fix
>>>>> the issue, but I just want to be sure that I shouldn't, instead, be
>>>>> taking
>>>>> the memslot lock:
>>>>
>>>> I would rather avoid having long-lived maps, as I am working on removing
>>>> them from the Intel code.  However, it seems to me that the GHCB is almost
>>>> not used after sev_handle_vmgexit returns?
>>>
>>> Except for as you pointed out below, things like MMIO and IO. Anything
>>> that has to exit to userspace to complete will still need the GHCB mapped
>>> when coming back into the kernel if the shared buffer area of the GHCB is
>>> being used.
>>>
>>> Btw, what do you consider long lived maps?  Is having a map while context
>>> switching back to userspace considered long lived? The GHCB will
>>> (possibly) only be mapped on VMEXIT (VMGEXIT) and unmapped on the next
>>> VMRUN for the vCPU. An AP reset hold could be a while, though.
>>
>> Anything that cannot be unmapped in the same function that maps it,
>> essentially.
>>
>>>> 2) upon an AP reset hold exit, the above patch sets the EXITINFO2 field
>>>> before the SIPI was received.  My understanding is that the processor will
>>>> not see the value anyway until it resumes execution, and why would other
>>>> vCPUs muck with the AP's GHCB.  But I'm not sure if it's okay.
>>>
>>> As long as the vCPU might not be woken up for any reason other than a
>>> SIPI, you can get a way with this. But if it was to be woken up for some
>>> other reason (an IPI for some reason?), then you wouldn't want the
>>> non-zero value set in the GHCB in advance, because that might cause the
>>> vCPU to exit the HLT loop it is in waiting for the actual SIPI.
>>
>> Ok.  Then the best thing to do is to pull sev_es_pre_run to the
>> prepare_guest_switch callback.
> 
> A quick test of this failed (VMRUN failure), let me see what is going on
> and post back.

I couldn't just move sev_es_pre_run() into sev_es_prepare_guest_switch()
because of the guest_state_loaded check in svm_prepare_guest_switch().

Renaming sev_es_pre_run() to sev_es_unmap_ghcb() and calling it before the
guest_state_loaded check worked nicely. I'll send a patch soon.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>
>> Paolo
>>
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 17adc1e79136..76f90cf8c5aa 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2723,7 +2723,10 @@  static void pre_sev_es_run(struct vcpu_svm *svm)
 
 	sev_es_sync_to_ghcb(svm);
 
+	svm->vcpu.srcu_idx = srcu_read_lock(&svm->vcpu.kvm->srcu);
 	kvm_vcpu_unmap(&svm->vcpu, &svm->ghcb_map, true);
+	srcu_read_unlock(&svm->vcpu.kvm->srcu, svm->vcpu.srcu_idx);
+
 	svm->ghcb = NULL;
 }