diff mbox series

KVM: Deal with nested sleeps in kvm_vcpu_block()

Message ID 20221130161946.3254953-1-spm@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: Deal with nested sleeps in kvm_vcpu_block() | expand

Commit Message

Space Meyer Nov. 30, 2022, 4:19 p.m. UTC
Previously this code assumed nothing would mess with current->state
between the set_current_state() and schedule(). However the call to
kvm_vcpu_check_block() in between might end up requiring locks or other
actions, which would change current->state. A similar pattern was
described in the "The problem with nested sleeping primitives" LWN
article[0].

[0] https://lwn.net/Articles/628628

Signed-off-by: Space Meyer <spm@google.com>
---
 virt/kvm/kvm_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Paolo Bonzini Nov. 30, 2022, 4:48 p.m. UTC | #1
On Wed, Nov 30, 2022 at 5:20 PM Space Meyer <spm@google.com> wrote:
> Previously this code assumed nothing would mess with current->state
> between the set_current_state() and schedule(). However the call to
> kvm_vcpu_check_block() in between might end up requiring locks or other
> actions, which would change current->state

This would be a bug (in particular kvm_arch_vcpu_runnable() and
kvm_cpu_has_pending_timer() should not need any lock). Do you
have a specific call stack in mind?

Paolo

>
> Signed-off-by: Space Meyer <spm@google.com>
> ---
>  virt/kvm/kvm_main.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fab4d37905785..64e10d73f2a92 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -32,6 +32,7 @@
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
>  #include <linux/sched/stat.h>
> +#include <linux/wait.h>
>  #include <linux/cpumask.h>
>  #include <linux/smp.h>
>  #include <linux/anon_inodes.h>
> @@ -3426,6 +3427,7 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
>   */
>  bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  {
> +       DEFINE_WAIT_FUNC(vcpu_block_wait, woken_wake_function);
>         struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
>         bool waited = false;
>
> @@ -3437,13 +3439,11 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
>         preempt_enable();
>
>         for (;;) {
> -               set_current_state(TASK_INTERRUPTIBLE);
> -
>                 if (kvm_vcpu_check_block(vcpu) < 0)
>                         break;
>
>                 waited = true;
> -               schedule();
> +               wait_woken(&vcpu_block_wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
>         }
>
>         preempt_disable();
> --
> 2.38.1.584.g0f3c55d4c2-goog
>
Paolo Bonzini Nov. 30, 2022, 4:58 p.m. UTC | #2
On 11/30/22 17:19, Space Meyer wrote:
>   bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
>   {
> +	DEFINE_WAIT_FUNC(vcpu_block_wait, woken_wake_function);
>   	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
>   	bool waited = false;
>   
> @@ -3437,13 +3439,11 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
>   	preempt_enable();
>   
>   	for (;;) {
> -		set_current_state(TASK_INTERRUPTIBLE);
> -
>   		if (kvm_vcpu_check_block(vcpu) < 0)
>   			break;
>   
>   		waited = true;
> -		schedule();
> +		wait_woken(&vcpu_block_wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
>   	}

Also, this does not work I think, because there is 
add_wait_queue()/remove_wait_queue() pair.  Adding it is not easy 
because KVM is using a struct rcuwait here instead of a wait_queue_t.

Paolo
Space Meyer Dec. 2, 2022, 3:52 p.m. UTC | #3
On Wed, Nov 30, 2022 at 5:59 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 11/30/22 17:19, Space Meyer wrote:
> >   bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
> >   {
> > +     DEFINE_WAIT_FUNC(vcpu_block_wait, woken_wake_function);
> >       struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
> >       bool waited = false;
> >
> > @@ -3437,13 +3439,11 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
> >       preempt_enable();
> >
> >       for (;;) {
> > -             set_current_state(TASK_INTERRUPTIBLE);
> > -
> >               if (kvm_vcpu_check_block(vcpu) < 0)
> >                       break;
> >
> >               waited = true;
> > -             schedule();
> > +             wait_woken(&vcpu_block_wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
> >       }
>
> Also, this does not work I think, because there is
> add_wait_queue()/remove_wait_queue() pair.  Adding it is not easy
> because KVM is using a struct rcuwait here instead of a wait_queue_t.

Ah, sorry. I really was a bit quick on this one. I agree nothing would ever call
woken_wake_function, hence my patch doesn't make sense. Looking at the rcuwait
code I don't see something similar to wait_woken.

Do you see some other way we could avoid the pattern susceptible to the nested
sleeping problem?
Space Meyer Dec. 2, 2022, 3:52 p.m. UTC | #4
On Wed, Nov 30, 2022 at 5:49 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On Wed, Nov 30, 2022 at 5:20 PM Space Meyer <spm@google.com> wrote:
> > Previously this code assumed nothing would mess with current->state
> > between the set_current_state() and schedule(). However the call to
> > kvm_vcpu_check_block() in between might end up requiring locks or other
> > actions, which would change current->state
>
> This would be a bug (in particular kvm_arch_vcpu_runnable() and
> kvm_cpu_has_pending_timer() should not need any lock). Do you
> have a specific call stack in mind?

You already fixed the specific call stack in 26844fe upstream. Syzkaller was
able to exercise the call stack you outlined in that commit on a 5.10 based
branch via:

------------[ cut here ]------------
do not call blocking ops when !TASK_RUNNING; state=1 set at
[<00000000f941c5dd>] kvm_vcpu_block+0x330/0xaf0
WARNING: CPU: 1 PID: 2513 at kernel/sched/core.c:12350 __might_sleep+0xd7/0xe0
[...]
Call Trace:
__might_fault+0x6c/0x120
__kvm_read_guest_page+0x163/0x230
kvm_vcpu_read_guest+0xc3/0x150
read_and_check_msr_entry+0x3f/0x310
nested_vmx_store_msr+0x12c/0x360
prepare_vmcs12+0x5f2/0xd90
nested_vmx_vmexit+0x663/0x17e0
vmx_check_nested_events+0xfd8/0x1c60
kvm_arch_vcpu_runnable+0xda/0x6c0
kvm_vcpu_check_block+0x63/0x250
kvm_vcpu_block+0x3b7/0xaf0
[...]

The bug doesn't seem to be easily reproducible, but looking at the code this
should also be applicable for the upstream 6.0, 5.15, 5.10, 5.4, 4.19 and 4.14
branches, which have not received a backport of 26844fe.

Do you think this is all we should do? My conclusion from the LWN article was,
that we should avoid the set_current_state -> conditional -> schedule pattern
when possible as well.
Paolo Bonzini Dec. 2, 2022, 6:26 p.m. UTC | #5
On 12/2/22 16:52, Space Meyer wrote:
> The bug doesn't seem to be easily reproducible, but looking at the
> code this should also be applicable for the upstream 6.0, 5.15, 5.10,
> 5.4, 4.19 and 4.14 branches, which have not received a backport of
> 26844fe.
> 
> Do you think this is all we should do? My conclusion from the LWN
> article was, that we should avoid the set_current_state ->
> conditional -> schedule pattern when possible as well.

Yes, the bug is there but unfortunately the backport is not easy.  I 
don't really feel confident backporting the fix to older kernels; even 
if it is apparently just a couple lines of code, event handling is very 
delicate and has had a lot of changes recently.

Paolo
Paolo Bonzini Dec. 2, 2022, 6:27 p.m. UTC | #6
On 12/2/22 16:52, Space Meyer wrote:
>> Also, this does not work I think, because there is
>> add_wait_queue()/remove_wait_queue() pair.  Adding it is not easy
>> because KVM is using a struct rcuwait here instead of a wait_queue_t.
> Ah, sorry. I really was a bit quick on this one. I agree nothing would ever call
> woken_wake_function, hence my patch doesn't make sense. Looking at the rcuwait
> code I don't see something similar to wait_woken.
> 
> Do you see some other way we could avoid the pattern susceptible to the nested
> sleeping problem?

No, I think we should just treat them as bugs and fix them.

Paolo
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fab4d37905785..64e10d73f2a92 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -32,6 +32,7 @@ 
 #include <linux/sched/signal.h>
 #include <linux/sched/mm.h>
 #include <linux/sched/stat.h>
+#include <linux/wait.h>
 #include <linux/cpumask.h>
 #include <linux/smp.h>
 #include <linux/anon_inodes.h>
@@ -3426,6 +3427,7 @@  static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
  */
 bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
 {
+	DEFINE_WAIT_FUNC(vcpu_block_wait, woken_wake_function);
 	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
 	bool waited = false;
 
@@ -3437,13 +3439,11 @@  bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	preempt_enable();
 
 	for (;;) {
-		set_current_state(TASK_INTERRUPTIBLE);
-
 		if (kvm_vcpu_check_block(vcpu) < 0)
 			break;
 
 		waited = true;
-		schedule();
+		wait_woken(&vcpu_block_wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
 	}
 
 	preempt_disable();