Message ID | 168bf8c689561da904e48e2ff5ae4713eaef9e2d.camel@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block() | expand |
David Woodhouse <dwmw2@infradead.org> writes: > From: David Woodhouse <dwmw@amazon.co.uk> > > In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before > making a final check whether the vCPU should be woken from HLT by any > incoming interrupt. > > This is a problem for the get_user() in __kvm_xen_has_interrupt(), which > really shouldn't be sleeping when the task state has already been set. > I think it's actually harmless as it would just manifest itself as a > spurious wakeup, but it's causing a debug warning: > > [ 230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80 > > Fix the warning by turning it into an *explicit* spurious wakeup. When > invoked with !task_is_running(current) (and we might as well add > in_atomic() there while we're at it), just return 1 to indicate that > an IRQ is pending, which will cause a wakeup and then something will > call it again in a context that *can* sleep so it can fault the page > back in. > > Cc: stable@vger.kernel.org > Fixes: 40da8ccd724f ("KVM: x86/xen: Add event channel interrupt vector upcall") > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > > --- > arch/x86/kvm/xen.c | 27 ++++++++++++++++++++++----- > 1 file changed, 22 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c > index 9ea9c3dabe37..8f62baebd028 100644 > --- a/arch/x86/kvm/xen.c > +++ b/arch/x86/kvm/xen.c > @@ -190,6 +190,7 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state) > > int __kvm_xen_has_interrupt(struct kvm_vcpu *v) > { > + int err; > u8 rc = 0; > > /* > @@ -216,13 +217,29 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v) > if (likely(slots->generation == ghc->generation && > !kvm_is_error_hva(ghc->hva) && ghc->memslot)) { > /* Fast path */ > - __get_user(rc, (u8 __user *)ghc->hva + offset); > - } else { > - /* Slow path */ > - kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset, > - sizeof(rc)); > + pagefault_disable(); > + err = __get_user(rc, (u8 __user *)ghc->hva + offset); > + pagefault_enable(); This reminds me of copy_from_user_nofault() -- can we use it instead maybe? > + if (!err) > + return rc; > } > > + /* Slow path */ > + > + /* > + * This function gets called from kvm_vcpu_block() after setting the > + * task to TASK_INTERRUPTIBLE, to see if it needs to wake immediately > + * from a HLT. So we really mustn't sleep. If the page ended up absent > + * at that point, just return 1 in order to trigger an immediate wake, > + * and we'll end up getting called again from a context where we *can* > + * fault in the page and wait for it. > + */ > + if (in_atomic() || !task_is_running(current)) > + return 1; > + > + kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset, > + sizeof(rc)); > + > return rc; > } > > >
On 23/10/21 21:47, David Woodhouse wrote: > From: David Woodhouse <dwmw@amazon.co.uk> > > In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before > making a final check whether the vCPU should be woken from HLT by any > incoming interrupt. > > This is a problem for the get_user() in __kvm_xen_has_interrupt(), which > really shouldn't be sleeping when the task state has already been set. > I think it's actually harmless as it would just manifest itself as a > spurious wakeup, but it's causing a debug warning: > > [ 230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80 > > Fix the warning by turning it into an *explicit* spurious wakeup. When > invoked with !task_is_running(current) (and we might as well add > in_atomic() there while we're at it), just return 1 to indicate that > an IRQ is pending, which will cause a wakeup and then something will > call it again in a context that *can* sleep so it can fault the page > back in. > > Cc: stable@vger.kernel.org > Fixes: 40da8ccd724f ("KVM: x86/xen: Add event channel interrupt vector upcall") > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > > --- > arch/x86/kvm/xen.c | 27 ++++++++++++++++++++++----- > 1 file changed, 22 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c > index 9ea9c3dabe37..8f62baebd028 100644 > --- a/arch/x86/kvm/xen.c > +++ b/arch/x86/kvm/xen.c > @@ -190,6 +190,7 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state) > > int __kvm_xen_has_interrupt(struct kvm_vcpu *v) > { > + int err; > u8 rc = 0; > > /* > @@ -216,13 +217,29 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v) > if (likely(slots->generation == ghc->generation && > !kvm_is_error_hva(ghc->hva) && ghc->memslot)) { > /* Fast path */ > - __get_user(rc, (u8 __user *)ghc->hva + offset); > - } else { > - /* Slow path */ > - kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset, > - sizeof(rc)); > + pagefault_disable(); > + err = __get_user(rc, (u8 __user *)ghc->hva + offset); > + pagefault_enable(); > + if (!err) > + return rc; > } > > + /* Slow path */ > + > + /* > + * This function gets called from kvm_vcpu_block() after setting the > + * task to TASK_INTERRUPTIBLE, to see if it needs to wake immediately > + * from a HLT. So we really mustn't sleep. If the page ended up absent > + * at that point, just return 1 in order to trigger an immediate wake, > + * and we'll end up getting called again from a context where we *can* > + * fault in the page and wait for it. > + */ > + if (in_atomic() || !task_is_running(current)) > + return 1; > + > + kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset, > + sizeof(rc)); > + > return rc; > } > > > Queued, thanks. Paolo
On Mon, 2021-10-25 at 13:28 +0200, Vitaly Kuznetsov wrote: > > + pagefault_disable(); > > + err = __get_user(rc, (u8 __user *)ghc->hva + offset); > > + pagefault_enable(); > > This reminds me of copy_from_user_nofault() -- can we use it instead maybe? That's a lot of extra out of line function calls and redundant (I believe) setup/checks, and would make the fast path fairly pointless for the purpose it was *originally* introduced, which was to optimise the case where we're entering the vCPU and just want to check vcpu_info->evtchn_upcall_pending with a simple (fault-handled) dereference.
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 9ea9c3dabe37..8f62baebd028 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -190,6 +190,7 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state) int __kvm_xen_has_interrupt(struct kvm_vcpu *v) { + int err; u8 rc = 0; /* @@ -216,13 +217,29 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v) if (likely(slots->generation == ghc->generation && !kvm_is_error_hva(ghc->hva) && ghc->memslot)) { /* Fast path */ - __get_user(rc, (u8 __user *)ghc->hva + offset); - } else { - /* Slow path */ - kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset, - sizeof(rc)); + pagefault_disable(); + err = __get_user(rc, (u8 __user *)ghc->hva + offset); + pagefault_enable(); + if (!err) + return rc; } + /* Slow path */ + + /* + * This function gets called from kvm_vcpu_block() after setting the + * task to TASK_INTERRUPTIBLE, to see if it needs to wake immediately + * from a HLT. So we really mustn't sleep. If the page ended up absent + * at that point, just return 1 in order to trigger an immediate wake, + * and we'll end up getting called again from a context where we *can* + * fault in the page and wait for it. + */ + if (in_atomic() || !task_is_running(current)) + return 1; + + kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset, + sizeof(rc)); + return rc; }