diff mbox series

[RFC,1/2] KVM: x86/xen: Fix use-after-free in kvm_xen_eventfd_update()

Message ID 20221222203021.1944101-2-mhal@rbox.co (mailing list archive)
State New, archived
Headers show
Series [RFC,1/2] KVM: x86/xen: Fix use-after-free in kvm_xen_eventfd_update() | expand

Commit Message

Michal Luczaj Dec. 22, 2022, 8:30 p.m. UTC
Protect `evtchnfd` by entering SRCU critical section.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 arch/x86/kvm/xen.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Comments

Paolo Bonzini Dec. 24, 2022, 8:52 a.m. UTC | #1
On 12/22/22 21:30, Michal Luczaj wrote:
> +	idx = srcu_read_lock(&kvm->srcu);
> +
>   	mutex_lock(&kvm->lock);
>   	evtchnfd = idr_find(&kvm->arch.xen.evtchn_ports, port);
>   	mutex_unlock(&kvm->lock);
>   

This lock/unlock pair can cause a deadlock because it's inside the SRCU 
read side critical section.  Fortunately it's simpler to just use 
mutex_lock for the whole function instead of using two small critical 
sections, and then SRCU is not needed.

However, the same issue exists in kvm_xen_hcall_evtchn_send where 
idr_find is not protected by kvm->lock.  In that case, it is important 
to use rcu_read_lock/unlock() around it, because idr_remove() will *not* 
use synchronize_srcu() to wait for readers to complete.

Paolo
Michal Luczaj Dec. 24, 2022, 11:14 a.m. UTC | #2
On 12/24/22 09:52, Paolo Bonzini wrote:
> On 12/22/22 21:30, Michal Luczaj wrote:
>> +	idx = srcu_read_lock(&kvm->srcu);
>> +
>>   	mutex_lock(&kvm->lock);
>>   	evtchnfd = idr_find(&kvm->arch.xen.evtchn_ports, port);
>>   	mutex_unlock(&kvm->lock);
>>   
> 
> This lock/unlock pair can cause a deadlock because it's inside the SRCU 
> read side critical section.  Fortunately it's simpler to just use 
> mutex_lock for the whole function instead of using two small critical 
> sections, and then SRCU is not needed.

Ah, I didn't think using a single mutex_lock would be ok. And maybe that's
a silly question, but how can this lock/unlock pair cause a deadlock? My
assumption was it's a *sleepable* RCU after all.

> However, the same issue exists in kvm_xen_hcall_evtchn_send where 
> idr_find is not protected by kvm->lock.  In that case, it is important 
> to use rcu_read_lock/unlock() around it, because idr_remove() will *not* 
> use synchronize_srcu() to wait for readers to complete.

Isn't kvm_xen_hcall_evtchn_send() already protected by srcu_read_lock()?

vcpu_enter_guest
  kvm_vcpu_srcu_read_lock		<--
  kvm_x86_handle_exit
    vmx_handle_exit
      __vmx_handle_exit (via kvm_vmx_exit_handlers)
        kvm_emulate_hypercall
          kvm_xen_hypercall
            kvm_xen_hcall_evtchn_send

Thanks,
Michal
Paolo Bonzini Dec. 27, 2022, 11:11 a.m. UTC | #3
On 12/24/22 12:14, Michal Luczaj wrote:
>> This lock/unlock pair can cause a deadlock because it's inside the SRCU
>> read side critical section.  Fortunately it's simpler to just use
>> mutex_lock for the whole function instead of using two small critical
>> sections, and then SRCU is not needed.
> Ah, I didn't think using a single mutex_lock would be ok. And maybe that's
> a silly question, but how can this lock/unlock pair cause a deadlock? My
> assumption was it's a*sleepable*  RCU after all.
> 

This is a pretty simple AB-BA deadlock, just involving SRCU and a mutex 
instead of two mutexes:

Thread 1			Thread 2
				mutex_lock()
srcu_read_lock()
mutex_lock()  // stops here
				synchronize_srcu()  // stops here
				mutex_unlock()
mutex_unlock()
srcu_read_unlock()

Thread 2's synchronize_srcu() is waiting for srcu_read_unlock(), which 
however won't happen until thread 1 can take the mutex.  But the mutex 
is taken by thread 2, hence the deadlock.

Paolo
Michal Luczaj Dec. 28, 2022, 12:21 a.m. UTC | #4
On 12/27/22 12:11, Paolo Bonzini wrote:
> On 12/24/22 12:14, Michal Luczaj wrote:
>>> This lock/unlock pair can cause a deadlock because it's inside the SRCU
>>> read side critical section.  Fortunately it's simpler to just use
>>> mutex_lock for the whole function instead of using two small critical
>>> sections, and then SRCU is not needed.
>> Ah, I didn't think using a single mutex_lock would be ok. And maybe that's
>> a silly question, but how can this lock/unlock pair cause a deadlock? My
>> assumption was it's a*sleepable*  RCU after all.
>>
> 
> This is a pretty simple AB-BA deadlock, just involving SRCU and a mutex 
> instead of two mutexes:
> 
> Thread 1			Thread 2
> 				mutex_lock()
> srcu_read_lock()
> mutex_lock()  // stops here
> 				synchronize_srcu()  // stops here
> 				mutex_unlock()
> mutex_unlock()
> srcu_read_unlock()
> 
> Thread 2's synchronize_srcu() is waiting for srcu_read_unlock(), which 
> however won't happen until thread 1 can take the mutex.  But the mutex 
> is taken by thread 2, hence the deadlock.

Ahh, I see. Thank you for explaining.

Does it mean kvm_xen_hcall_evtchn_send() deadlocks in the same fashion?

				kvm_xen_eventfd_reset()
				  mutex_lock()
srcu_read_lock()
kvm_xen_hcall_evtchn_send()
  kvm_xen_set_evtchn()
    mutex_lock()
    				  synchronize_srcu()

Michal
David Woodhouse Dec. 28, 2022, 9:32 a.m. UTC | #5
On Wed, 2022-12-28 at 01:21 +0100, Michal Luczaj wrote:
> On 12/27/22 12:11, Paolo Bonzini wrote:
> > On 12/24/22 12:14, Michal Luczaj wrote:
> > > > This lock/unlock pair can cause a deadlock because it's inside the SRCU
> > > > read side critical section.  Fortunately it's simpler to just use
> > > > mutex_lock for the whole function instead of using two small critical
> > > > sections, and then SRCU is not needed.
> > > Ah, I didn't think using a single mutex_lock would be ok. And maybe that's
> > > a silly question, but how can this lock/unlock pair cause a deadlock? My
> > > assumption was it's a*sleepable*  RCU after all.
> > > 
> > 
> > This is a pretty simple AB-BA deadlock, just involving SRCU and a mutex 
> > instead of two mutexes:
> > 
> > Thread 1                        Thread 2
> >                                 mutex_lock()
> > srcu_read_lock()
> > mutex_lock()  // stops here
> >                                 synchronize_srcu()  // stops here
> >                                 mutex_unlock()
> > mutex_unlock()
> > srcu_read_unlock()
> > 
> > Thread 2's synchronize_srcu() is waiting for srcu_read_unlock(), which 
> > however won't happen until thread 1 can take the mutex.  But the mutex 
> > is taken by thread 2, hence the deadlock.
> 
> Ahh, I see. Thank you for explaining.
> 
> Does it mean kvm_xen_hcall_evtchn_send() deadlocks in the same fashion?
> 
>                                 kvm_xen_eventfd_reset()
>                                   mutex_lock()
> srcu_read_lock()
> kvm_xen_hcall_evtchn_send()
>   kvm_xen_set_evtchn()
>     mutex_lock()
>                                   synchronize_srcu()
> 
> Michal

Good catch.

You'll note that above that mutex_lock() in kvm_xen_set_evtchn()
there's already a comment saying "if in future it will be called
directly from a vCPU thread"... which it is now.

	/*
	 * For the irqfd workqueue, using the main kvm->lock mutex is
	 * fine since this function is invoked from kvm_set_irq() with
	 * no other lock held, no srcu. In future if it will be called
	 * directly from a vCPU thread (e.g. on hypercall for an IPI)
	 * then it may need to switch to using a leaf-node mutex for
	 * serializing the shared_info mapping.
	 */
	mutex_lock(&kvm->lock);

I think that if we switch to a new lock rather than piggy-backing on
kvm->lock, we can declare that none shall call synchronize_srcu() while
holding it, and thus avoid the potential deadlock?

We have to fix the one case where the Xen code already does that, in
kvm_xen_eventfd_reset(). But fixing that one alone wouldn't solve it
for the general case and let us use kvm->lock, I presume.
Paolo Bonzini Dec. 28, 2022, 9:39 a.m. UTC | #6
On 12/28/22 01:21, Michal Luczaj wrote:
> Does it mean kvm_xen_hcall_evtchn_send() deadlocks in the same fashion?
> 
> 				kvm_xen_eventfd_reset()
> 				  mutex_lock()
> srcu_read_lock()
> kvm_xen_hcall_evtchn_send()
>    kvm_xen_set_evtchn()
>      mutex_lock()
>      				  synchronize_srcu()

Yes, I imagine that in practice you won't have running vCPUs during a 
reset but the bug exists.  Thanks!

Paolo
David Woodhouse Dec. 28, 2022, 9:54 a.m. UTC | #7
On Wed, 2022-12-28 at 10:39 +0100, Paolo Bonzini wrote:
> On 12/28/22 01:21, Michal Luczaj wrote:
> > Does it mean kvm_xen_hcall_evtchn_send() deadlocks in the same
> > fashion?
> > 
> >                                 kvm_xen_eventfd_reset()
> >                                   mutex_lock()
> > srcu_read_lock()
> > kvm_xen_hcall_evtchn_send()
> >     kvm_xen_set_evtchn()
> >       mutex_lock()
> >                                   synchronize_srcu()
> 
> Yes, I imagine that in practice you won't have running vCPUs during a
> reset but the bug exists.  Thanks!

If it's just kvm_xen_evtchn_reset() I can fix that — and have to
anyway, even if we switch the Xen code to its own lock.

But what is the general case lock ordering rule here? Can other code
call synchronize_srcu() while holding kvm->lock? Or is that verboten?
Paolo Bonzini Dec. 28, 2022, 11:58 a.m. UTC | #8
On 12/28/22 10:54, David Woodhouse wrote:
>> Yes, I imagine that in practice you won't have running vCPUs during a
>> reset but the bug exists.  Thanks!
> If it's just kvm_xen_evtchn_reset() I can fix that — and have to
> anyway, even if we switch the Xen code to its own lock.
> 
> But what is the general case lock ordering rule here? Can other code
> call synchronize_srcu() while holding kvm->lock? Or is that verboten?

Nope, it's a general rule---and one that would extend to any other lock 
taken inside srcu_read_lock(&kvm->srcu).

I have sent a patch to fix reset, and one to clarify the lock ordering 
rules.

Paolo
David Woodhouse Dec. 28, 2022, 12:35 p.m. UTC | #9
On 28 December 2022 11:58:56 GMT, Paolo Bonzini <pbonzini@redhat.com> wrote:
>On 12/28/22 10:54, David Woodhouse wrote:
>>> Yes, I imagine that in practice you won't have running vCPUs during a
>>> reset but the bug exists.  Thanks!
>> If it's just kvm_xen_evtchn_reset() I can fix that — and have to
>> anyway, even if we switch the Xen code to its own lock.
>> 
>> But what is the general case lock ordering rule here? Can other code
>> call synchronize_srcu() while holding kvm->lock? Or is that verboten?
>
>Nope, it's a general rule---and one that would extend to any other lock taken inside srcu_read_lock(&kvm->srcu).
>
>I have sent a patch to fix reset, and one to clarify the lock ordering rules.

Can we teach lockdep too?
Paolo Bonzini Dec. 28, 2022, 1:14 p.m. UTC | #10
On 12/28/22 13:35, David Woodhouse wrote:
>>> what is the general case lock ordering rule here? Can other code
>>> call synchronize_srcu() while holding kvm->lock? Or is that verboten?
>>
>> Nope, it's a general rule---and one that would extend to any other
>> lock taken inside srcu_read_lock(&kvm->srcu).
>> 
>> I have sent a patch to fix reset, and one to clarify the lock
>> ordering rules.
>
> Can we teach lockdep too?

I think it's complicated.  On one hand, synchronize_srcu() is very much 
a back to back write_lock/write_unlock.  In other words it would be easy 
for lockdep to treat this:

                                   mutex_lock(A)
     srcu_read_lock(B)
     mutex_lock(A)
                                   synchronize_srcu(B)
     srcu_read_unlock(B)

like this:

                                   exclusive_lock(A)
     shared_lock_recursive(B)
     exclusive_lock(A)
                                   exclusive_lock(B)
                                   exclusive_unlock(B)
     shared_unlock_recursive(B)


On the other hand, srcu_read_lock() does not block so this is safe:

                                   mutex_lock(A)
     srcu_read_lock(B)
     mutex_lock(A)
                                   srcu_read_lock(B)

unlike the corresponding case where B is a rwlock/rwsem.

Paolo
Michal Luczaj Dec. 29, 2022, 2:12 a.m. UTC | #11
On 12/28/22 12:58, Paolo Bonzini wrote:
> On 12/28/22 10:54, David Woodhouse wrote:
>> But what is the general case lock ordering rule here? Can other code
>> call synchronize_srcu() while holding kvm->lock? Or is that verboten?
> 
> Nope, it's a general rule---and one that would extend to any other lock 
> taken inside srcu_read_lock(&kvm->srcu).
> 
> I have sent a patch to fix reset, and one to clarify the lock ordering 
> rules.

It looks like there are more places with such bad ordering:
kvm_vm_ioctl_set_msr_filter(), kvm_vm_ioctl_set_pmu_event_filter().

Michal
Paolo Bonzini Dec. 29, 2022, 9:03 p.m. UTC | #12
On 12/29/22 03:12, Michal Luczaj wrote:
> It looks like there are more places with such bad ordering:
> kvm_vm_ioctl_set_msr_filter(), kvm_vm_ioctl_set_pmu_event_filter().

These are easy to fix because the unlock can just be moved before 
synchronize_srcu() or synchronize_srcu_expedited().  Would you like to 
send a patch?

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index d7af40240248..8e17629e5665 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1825,20 +1825,26 @@  static int kvm_xen_eventfd_update(struct kvm *kvm,
 {
 	u32 port = data->u.evtchn.send_port;
 	struct evtchnfd *evtchnfd;
+	int ret = -EINVAL;
+	int idx;
 
 	if (!port || port >= max_evtchn_port(kvm))
 		return -EINVAL;
 
+	idx = srcu_read_lock(&kvm->srcu);
+
 	mutex_lock(&kvm->lock);
 	evtchnfd = idr_find(&kvm->arch.xen.evtchn_ports, port);
 	mutex_unlock(&kvm->lock);
 
-	if (!evtchnfd)
-		return -ENOENT;
+	if (!evtchnfd) {
+		ret = -ENOENT;
+		goto out_rcu;
+	}
 
 	/* For an UPDATE, nothing may change except the priority/vcpu */
 	if (evtchnfd->type != data->u.evtchn.type)
-		return -EINVAL;
+		goto out_rcu; /* -EINVAL */
 
 	/*
 	 * Port cannot change, and if it's zero that was an eventfd
@@ -1846,11 +1852,11 @@  static int kvm_xen_eventfd_update(struct kvm *kvm,
 	 */
 	if (!evtchnfd->deliver.port.port ||
 	    evtchnfd->deliver.port.port != data->u.evtchn.deliver.port.port)
-		return -EINVAL;
+		goto out_rcu; /* -EINVAL */
 
 	/* We only support 2 level event channels for now */
 	if (data->u.evtchn.deliver.port.priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL)
-		return -EINVAL;
+		goto out_rcu; /* -EINVAL */
 
 	mutex_lock(&kvm->lock);
 	evtchnfd->deliver.port.priority = data->u.evtchn.deliver.port.priority;
@@ -1859,7 +1865,10 @@  static int kvm_xen_eventfd_update(struct kvm *kvm,
 		evtchnfd->deliver.port.vcpu_idx = -1;
 	}
 	mutex_unlock(&kvm->lock);
-	return 0;
+	ret = 0;
+out_rcu:
+	srcu_read_unlock(&kvm->srcu, idx);
+	return ret;
 }
 
 /*