Message ID | 20221130161946.3254953-1-spm@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: Deal with nested sleeps in kvm_vcpu_block() | expand |
On Wed, Nov 30, 2022 at 5:20 PM Space Meyer <spm@google.com> wrote: > Previously this code assumed nothing would mess with current->state > between the set_current_state() and schedule(). However the call to > kvm_vcpu_check_block() in between might end up requiring locks or other > actions, which would change current->state This would be a bug (in particular kvm_arch_vcpu_runnable() and kvm_cpu_has_pending_timer() should not need any lock). Do you have a specific call stack in mind? Paolo > > Signed-off-by: Space Meyer <spm@google.com> > --- > virt/kvm/kvm_main.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index fab4d37905785..64e10d73f2a92 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -32,6 +32,7 @@ > #include <linux/sched/signal.h> > #include <linux/sched/mm.h> > #include <linux/sched/stat.h> > +#include <linux/wait.h> > #include <linux/cpumask.h> > #include <linux/smp.h> > #include <linux/anon_inodes.h> > @@ -3426,6 +3427,7 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu) > */ > bool kvm_vcpu_block(struct kvm_vcpu *vcpu) > { > + DEFINE_WAIT_FUNC(vcpu_block_wait, woken_wake_function); > struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu); > bool waited = false; > > @@ -3437,13 +3439,11 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu) > preempt_enable(); > > for (;;) { > - set_current_state(TASK_INTERRUPTIBLE); > - > if (kvm_vcpu_check_block(vcpu) < 0) > break; > > waited = true; > - schedule(); > + wait_woken(&vcpu_block_wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT); > } > > preempt_disable(); > -- > 2.38.1.584.g0f3c55d4c2-goog >
On 11/30/22 17:19, Space Meyer wrote: > bool kvm_vcpu_block(struct kvm_vcpu *vcpu) > { > + DEFINE_WAIT_FUNC(vcpu_block_wait, woken_wake_function); > struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu); > bool waited = false; > > @@ -3437,13 +3439,11 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu) > preempt_enable(); > > for (;;) { > - set_current_state(TASK_INTERRUPTIBLE); > - > if (kvm_vcpu_check_block(vcpu) < 0) > break; > > waited = true; > - schedule(); > + wait_woken(&vcpu_block_wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT); > } Also, this does not work I think, because there is add_wait_queue()/remove_wait_queue() pair. Adding it is not easy because KVM is using a struct rcuwait here instead of a wait_queue_t. Paolo
On Wed, Nov 30, 2022 at 5:59 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 11/30/22 17:19, Space Meyer wrote: > > bool kvm_vcpu_block(struct kvm_vcpu *vcpu) > > { > > + DEFINE_WAIT_FUNC(vcpu_block_wait, woken_wake_function); > > struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu); > > bool waited = false; > > > > @@ -3437,13 +3439,11 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu) > > preempt_enable(); > > > > for (;;) { > > - set_current_state(TASK_INTERRUPTIBLE); > > - > > if (kvm_vcpu_check_block(vcpu) < 0) > > break; > > > > waited = true; > > - schedule(); > > + wait_woken(&vcpu_block_wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT); > > } > > Also, this does not work I think, because there is > add_wait_queue()/remove_wait_queue() pair. Adding it is not easy > because KVM is using a struct rcuwait here instead of a wait_queue_t. Ah, sorry. I really was a bit quick on this one. I agree nothing would ever call woken_wake_function, hence my patch doesn't make sense. Looking at the rcuwait code I don't see something similar to wait_woken. Do you see some other way we could avoid the pattern susceptible to the nested sleeping problem?
On Wed, Nov 30, 2022 at 5:49 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On Wed, Nov 30, 2022 at 5:20 PM Space Meyer <spm@google.com> wrote: > > Previously this code assumed nothing would mess with current->state > > between the set_current_state() and schedule(). However the call to > > kvm_vcpu_check_block() in between might end up requiring locks or other > > actions, which would change current->state > > This would be a bug (in particular kvm_arch_vcpu_runnable() and > kvm_cpu_has_pending_timer() should not need any lock). Do you > have a specific call stack in mind? You already fixed the specific call stack in 26844fe upstream. Syzkaller was able to exercise the call stack you outlined in that commit on a 5.10 based branch via: ------------[ cut here ]------------ do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000f941c5dd>] kvm_vcpu_block+0x330/0xaf0 WARNING: CPU: 1 PID: 2513 at kernel/sched/core.c:12350 __might_sleep+0xd7/0xe0 [...] Call Trace: __might_fault+0x6c/0x120 __kvm_read_guest_page+0x163/0x230 kvm_vcpu_read_guest+0xc3/0x150 read_and_check_msr_entry+0x3f/0x310 nested_vmx_store_msr+0x12c/0x360 prepare_vmcs12+0x5f2/0xd90 nested_vmx_vmexit+0x663/0x17e0 vmx_check_nested_events+0xfd8/0x1c60 kvm_arch_vcpu_runnable+0xda/0x6c0 kvm_vcpu_check_block+0x63/0x250 kvm_vcpu_block+0x3b7/0xaf0 [...] The bug doesn't seem to be easily reproducible, but looking at the code this should also be applicable for the upstream 6.0, 5.15, 5.10, 5.4, 4.19 and 4.14 branches, which have not received a backport of 26844fe. Do you think this is all we should do? My conclusion from the LWN article was, that we should avoid the set_current_state -> conditional -> schedule pattern when possible as well.
On 12/2/22 16:52, Space Meyer wrote: > The bug doesn't seem to be easily reproducible, but looking at the > code this should also be applicable for the upstream 6.0, 5.15, 5.10, > 5.4, 4.19 and 4.14 branches, which have not received a backport of > 26844fe. > > Do you think this is all we should do? My conclusion from the LWN > article was, that we should avoid the set_current_state -> > conditional -> schedule pattern when possible as well. Yes, the bug is there but unfortunately the backport is not easy. I don't really feel confident backporting the fix to older kernels; even if it is apparently just a couple lines of code, event handling is very delicate and has had a lot of changes recently. Paolo
On 12/2/22 16:52, Space Meyer wrote: >> Also, this does not work I think, because there is >> add_wait_queue()/remove_wait_queue() pair. Adding it is not easy >> because KVM is using a struct rcuwait here instead of a wait_queue_t. > Ah, sorry. I really was a bit quick on this one. I agree nothing would ever call > woken_wake_function, hence my patch doesn't make sense. Looking at the rcuwait > code I don't see something similar to wait_woken. > > Do you see some other way we could avoid the pattern susceptible to the nested > sleeping problem? No, I think we should just treat them as bugs and fix them. Paolo
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index fab4d37905785..64e10d73f2a92 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -32,6 +32,7 @@ #include <linux/sched/signal.h> #include <linux/sched/mm.h> #include <linux/sched/stat.h> +#include <linux/wait.h> #include <linux/cpumask.h> #include <linux/smp.h> #include <linux/anon_inodes.h> @@ -3426,6 +3427,7 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu) */ bool kvm_vcpu_block(struct kvm_vcpu *vcpu) { + DEFINE_WAIT_FUNC(vcpu_block_wait, woken_wake_function); struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu); bool waited = false; @@ -3437,13 +3439,11 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu) preempt_enable(); for (;;) { - set_current_state(TASK_INTERRUPTIBLE); - if (kvm_vcpu_check_block(vcpu) < 0) break; waited = true; - schedule(); + wait_woken(&vcpu_block_wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT); } preempt_disable();
Previously this code assumed nothing would mess with current->state between the set_current_state() and schedule(). However the call to kvm_vcpu_check_block() in between might end up requiring locks or other actions, which would change current->state. A similar pattern was described in the "The problem with nested sleeping primitives" LWN article[0]. [0] https://lwn.net/Articles/628628 Signed-off-by: Space Meyer <spm@google.com> --- virt/kvm/kvm_main.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)