Message ID | 20221222203021.1944101-2-mhal@rbox.co (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,1/2] KVM: x86/xen: Fix use-after-free in kvm_xen_eventfd_update() | expand |
On 12/22/22 21:30, Michal Luczaj wrote: > + idx = srcu_read_lock(&kvm->srcu); > + > mutex_lock(&kvm->lock); > evtchnfd = idr_find(&kvm->arch.xen.evtchn_ports, port); > mutex_unlock(&kvm->lock); > This lock/unlock pair can cause a deadlock because it's inside the SRCU read side critical section. Fortunately it's simpler to just use mutex_lock for the whole function instead of using two small critical sections, and then SRCU is not needed. However, the same issue exists in kvm_xen_hcall_evtchn_send where idr_find is not protected by kvm->lock. In that case, it is important to use rcu_read_lock/unlock() around it, because idr_remove() will *not* use synchronize_srcu() to wait for readers to complete. Paolo
On 12/24/22 09:52, Paolo Bonzini wrote: > On 12/22/22 21:30, Michal Luczaj wrote: >> + idx = srcu_read_lock(&kvm->srcu); >> + >> mutex_lock(&kvm->lock); >> evtchnfd = idr_find(&kvm->arch.xen.evtchn_ports, port); >> mutex_unlock(&kvm->lock); >> > > This lock/unlock pair can cause a deadlock because it's inside the SRCU > read side critical section. Fortunately it's simpler to just use > mutex_lock for the whole function instead of using two small critical > sections, and then SRCU is not needed. Ah, I didn't think using a single mutex_lock would be ok. And maybe that's a silly question, but how can this lock/unlock pair cause a deadlock? My assumption was it's a *sleepable* RCU after all. > However, the same issue exists in kvm_xen_hcall_evtchn_send where > idr_find is not protected by kvm->lock. In that case, it is important > to use rcu_read_lock/unlock() around it, because idr_remove() will *not* > use synchronize_srcu() to wait for readers to complete. Isn't kvm_xen_hcall_evtchn_send() already protected by srcu_read_lock()? vcpu_enter_guest kvm_vcpu_srcu_read_lock <-- kvm_x86_handle_exit vmx_handle_exit __vmx_handle_exit (via kvm_vmx_exit_handlers) kvm_emulate_hypercall kvm_xen_hypercall kvm_xen_hcall_evtchn_send Thanks, Michal
On 12/24/22 12:14, Michal Luczaj wrote: >> This lock/unlock pair can cause a deadlock because it's inside the SRCU >> read side critical section. Fortunately it's simpler to just use >> mutex_lock for the whole function instead of using two small critical >> sections, and then SRCU is not needed. > Ah, I didn't think using a single mutex_lock would be ok. And maybe that's > a silly question, but how can this lock/unlock pair cause a deadlock? My > assumption was it's a*sleepable* RCU after all. > This is a pretty simple AB-BA deadlock, just involving SRCU and a mutex instead of two mutexes: Thread 1 Thread 2 mutex_lock() srcu_read_lock() mutex_lock() // stops here synchronize_srcu() // stops here mutex_unlock() mutex_unlock() srcu_read_unlock() Thread 2's synchronize_srcu() is waiting for srcu_read_unlock(), which however won't happen until thread 1 can take the mutex. But the mutex is taken by thread 2, hence the deadlock. Paolo
On 12/27/22 12:11, Paolo Bonzini wrote: > On 12/24/22 12:14, Michal Luczaj wrote: >>> This lock/unlock pair can cause a deadlock because it's inside the SRCU >>> read side critical section. Fortunately it's simpler to just use >>> mutex_lock for the whole function instead of using two small critical >>> sections, and then SRCU is not needed. >> Ah, I didn't think using a single mutex_lock would be ok. And maybe that's >> a silly question, but how can this lock/unlock pair cause a deadlock? My >> assumption was it's a*sleepable* RCU after all. >> > > This is a pretty simple AB-BA deadlock, just involving SRCU and a mutex > instead of two mutexes: > > Thread 1 Thread 2 > mutex_lock() > srcu_read_lock() > mutex_lock() // stops here > synchronize_srcu() // stops here > mutex_unlock() > mutex_unlock() > srcu_read_unlock() > > Thread 2's synchronize_srcu() is waiting for srcu_read_unlock(), which > however won't happen until thread 1 can take the mutex. But the mutex > is taken by thread 2, hence the deadlock. Ahh, I see. Thank you for explaining. Does it mean kvm_xen_hcall_evtchn_send() deadlocks in the same fashion? kvm_xen_eventfd_reset() mutex_lock() srcu_read_lock() kvm_xen_hcall_evtchn_send() kvm_xen_set_evtchn() mutex_lock() synchronize_srcu() Michal
On Wed, 2022-12-28 at 01:21 +0100, Michal Luczaj wrote: > On 12/27/22 12:11, Paolo Bonzini wrote: > > On 12/24/22 12:14, Michal Luczaj wrote: > > > > This lock/unlock pair can cause a deadlock because it's inside the SRCU > > > > read side critical section. Fortunately it's simpler to just use > > > > mutex_lock for the whole function instead of using two small critical > > > > sections, and then SRCU is not needed. > > > Ah, I didn't think using a single mutex_lock would be ok. And maybe that's > > > a silly question, but how can this lock/unlock pair cause a deadlock? My > > > assumption was it's a*sleepable* RCU after all. > > > > > > > This is a pretty simple AB-BA deadlock, just involving SRCU and a mutex > > instead of two mutexes: > > > > Thread 1 Thread 2 > > mutex_lock() > > srcu_read_lock() > > mutex_lock() // stops here > > synchronize_srcu() // stops here > > mutex_unlock() > > mutex_unlock() > > srcu_read_unlock() > > > > Thread 2's synchronize_srcu() is waiting for srcu_read_unlock(), which > > however won't happen until thread 1 can take the mutex. But the mutex > > is taken by thread 2, hence the deadlock. > > Ahh, I see. Thank you for explaining. > > Does it mean kvm_xen_hcall_evtchn_send() deadlocks in the same fashion? > > kvm_xen_eventfd_reset() > mutex_lock() > srcu_read_lock() > kvm_xen_hcall_evtchn_send() > kvm_xen_set_evtchn() > mutex_lock() > synchronize_srcu() > > Michal Good catch. You'll note that above that mutex_lock() in kvm_xen_set_evtchn() there's already a comment saying "if in future it will be called directly from a vCPU thread"... which it is now. /* * For the irqfd workqueue, using the main kvm->lock mutex is * fine since this function is invoked from kvm_set_irq() with * no other lock held, no srcu. In future if it will be called * directly from a vCPU thread (e.g. on hypercall for an IPI) * then it may need to switch to using a leaf-node mutex for * serializing the shared_info mapping. */ mutex_lock(&kvm->lock); I think that if we switch to a new lock rather than piggy-backing on kvm->lock, we can declare that none shall call synchronize_srcu() while holding it, and thus avoid the potential deadlock? We have to fix the one case where the Xen code already does that, in kvm_xen_eventfd_reset(). But fixing that one alone wouldn't solve it for the general case and let us use kvm->lock, I presume.
On 12/28/22 01:21, Michal Luczaj wrote: > Does it mean kvm_xen_hcall_evtchn_send() deadlocks in the same fashion? > > kvm_xen_eventfd_reset() > mutex_lock() > srcu_read_lock() > kvm_xen_hcall_evtchn_send() > kvm_xen_set_evtchn() > mutex_lock() > synchronize_srcu() Yes, I imagine that in practice you won't have running vCPUs during a reset but the bug exists. Thanks! Paolo
On Wed, 2022-12-28 at 10:39 +0100, Paolo Bonzini wrote: > On 12/28/22 01:21, Michal Luczaj wrote: > > Does it mean kvm_xen_hcall_evtchn_send() deadlocks in the same > > fashion? > > > > kvm_xen_eventfd_reset() > > mutex_lock() > > srcu_read_lock() > > kvm_xen_hcall_evtchn_send() > > kvm_xen_set_evtchn() > > mutex_lock() > > synchronize_srcu() > > Yes, I imagine that in practice you won't have running vCPUs during a > reset but the bug exists. Thanks! If it's just kvm_xen_evtchn_reset() I can fix that — and have to anyway, even if we switch the Xen code to its own lock. But what is the general case lock ordering rule here? Can other code call synchronize_srcu() while holding kvm->lock? Or is that verboten?
On 12/28/22 10:54, David Woodhouse wrote: >> Yes, I imagine that in practice you won't have running vCPUs during a >> reset but the bug exists. Thanks! > If it's just kvm_xen_evtchn_reset() I can fix that — and have to > anyway, even if we switch the Xen code to its own lock. > > But what is the general case lock ordering rule here? Can other code > call synchronize_srcu() while holding kvm->lock? Or is that verboten? Nope, it's a general rule---and one that would extend to any other lock taken inside srcu_read_lock(&kvm->srcu). I have sent a patch to fix reset, and one to clarify the lock ordering rules. Paolo
On 28 December 2022 11:58:56 GMT, Paolo Bonzini <pbonzini@redhat.com> wrote: >On 12/28/22 10:54, David Woodhouse wrote: >>> Yes, I imagine that in practice you won't have running vCPUs during a >>> reset but the bug exists. Thanks! >> If it's just kvm_xen_evtchn_reset() I can fix that — and have to >> anyway, even if we switch the Xen code to its own lock. >> >> But what is the general case lock ordering rule here? Can other code >> call synchronize_srcu() while holding kvm->lock? Or is that verboten? > >Nope, it's a general rule---and one that would extend to any other lock taken inside srcu_read_lock(&kvm->srcu). > >I have sent a patch to fix reset, and one to clarify the lock ordering rules. Can we teach lockdep too?
On 12/28/22 13:35, David Woodhouse wrote: >>> what is the general case lock ordering rule here? Can other code >>> call synchronize_srcu() while holding kvm->lock? Or is that verboten? >> >> Nope, it's a general rule---and one that would extend to any other >> lock taken inside srcu_read_lock(&kvm->srcu). >> >> I have sent a patch to fix reset, and one to clarify the lock >> ordering rules. > > Can we teach lockdep too? I think it's complicated. On one hand, synchronize_srcu() is very much a back to back write_lock/write_unlock. In other words it would be easy for lockdep to treat this: mutex_lock(A) srcu_read_lock(B) mutex_lock(A) synchronize_srcu(B) srcu_read_unlock(B) like this: exclusive_lock(A) shared_lock_recursive(B) exclusive_lock(A) exclusive_lock(B) exclusive_unlock(B) shared_unlock_recursive(B) On the other hand, srcu_read_lock() does not block so this is safe: mutex_lock(A) srcu_read_lock(B) mutex_lock(A) srcu_read_lock(B) unlike the corresponding case where B is a rwlock/rwsem. Paolo
On 12/28/22 12:58, Paolo Bonzini wrote: > On 12/28/22 10:54, David Woodhouse wrote: >> But what is the general case lock ordering rule here? Can other code >> call synchronize_srcu() while holding kvm->lock? Or is that verboten? > > Nope, it's a general rule---and one that would extend to any other lock > taken inside srcu_read_lock(&kvm->srcu). > > I have sent a patch to fix reset, and one to clarify the lock ordering > rules. It looks like there are more places with such bad ordering: kvm_vm_ioctl_set_msr_filter(), kvm_vm_ioctl_set_pmu_event_filter(). Michal
On 12/29/22 03:12, Michal Luczaj wrote: > It looks like there are more places with such bad ordering: > kvm_vm_ioctl_set_msr_filter(), kvm_vm_ioctl_set_pmu_event_filter(). These are easy to fix because the unlock can just be moved before synchronize_srcu() or synchronize_srcu_expedited(). Would you like to send a patch? Paolo
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index d7af40240248..8e17629e5665 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -1825,20 +1825,26 @@ static int kvm_xen_eventfd_update(struct kvm *kvm, { u32 port = data->u.evtchn.send_port; struct evtchnfd *evtchnfd; + int ret = -EINVAL; + int idx; if (!port || port >= max_evtchn_port(kvm)) return -EINVAL; + idx = srcu_read_lock(&kvm->srcu); + mutex_lock(&kvm->lock); evtchnfd = idr_find(&kvm->arch.xen.evtchn_ports, port); mutex_unlock(&kvm->lock); - if (!evtchnfd) - return -ENOENT; + if (!evtchnfd) { + ret = -ENOENT; + goto out_rcu; + } /* For an UPDATE, nothing may change except the priority/vcpu */ if (evtchnfd->type != data->u.evtchn.type) - return -EINVAL; + goto out_rcu; /* -EINVAL */ /* * Port cannot change, and if it's zero that was an eventfd @@ -1846,11 +1852,11 @@ static int kvm_xen_eventfd_update(struct kvm *kvm, */ if (!evtchnfd->deliver.port.port || evtchnfd->deliver.port.port != data->u.evtchn.deliver.port.port) - return -EINVAL; + goto out_rcu; /* -EINVAL */ /* We only support 2 level event channels for now */ if (data->u.evtchn.deliver.port.priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL) - return -EINVAL; + goto out_rcu; /* -EINVAL */ mutex_lock(&kvm->lock); evtchnfd->deliver.port.priority = data->u.evtchn.deliver.port.priority; @@ -1859,7 +1865,10 @@ static int kvm_xen_eventfd_update(struct kvm *kvm, evtchnfd->deliver.port.vcpu_idx = -1; } mutex_unlock(&kvm->lock); - return 0; + ret = 0; +out_rcu: + srcu_read_unlock(&kvm->srcu, idx); + return ret; } /*
Protect `evtchnfd` by entering SRCU critical section. Signed-off-by: Michal Luczaj <mhal@rbox.co> --- arch/x86/kvm/xen.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)