Message ID | 565C0B1F.7070606@de.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 30 November 2015 at 00:38, Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 11/27/2015 09:42 PM, Tyler Baker wrote: >> On 27 November 2015 at 10:53, Tyler Baker <tyler.baker@linaro.org> wrote: >>> On 27 November 2015 at 09:08, Tyler Baker <tyler.baker@linaro.org> wrote: >>>> On 27 November 2015 at 00:54, Christian Borntraeger >>>> <borntraeger@de.ibm.com> wrote: >>>>> On 11/26/2015 09:47 PM, Christian Borntraeger wrote: >>>>>> On 11/26/2015 05:17 PM, Tyler Baker wrote: >>>>>>> Hi Christian, >>>>>>> >>>>>>> The kernelci.org bot recently has been reporting kvm guest boot >>>>>>> failures[1] on various arm64 platforms in next-20151126. The bot >>>>>>> bisected[2] the failures to the commit in -next titled "KVM: Create >>>>>>> debugfs dir and stat files for each VM". I confirmed by reverting this >>>>>>> commit on top of next-20151126 it resolves the boot issue. >>>>>>> >>>>>>> In this test case the host and guest are booted with the same kernel. >>>>>>> The host is booted over nfs, installs qemu (qemu-system arm64 2.4.0), >>>>>>> and launches a guest. The host is booting fine, but when the guest is >>>>>>> launched it errors with "Failed to retrieve host CPU features!". I >>>>>>> checked the host logs, and found an "Unable to handle kernel paging >>>>>>> request" splat[3] which occurs when the guest is attempting to start. >>>>>>> >>>>>>> I scanned the patch in question but nothing obvious jumped out at me, >>>>>>> any thoughts? >>>>>> >>>>>> Not really. >>>>>> Do you have processing running that do read the files in /sys/kernel/debug/kvm/* ? >>>>>> >>>>>> If I read the arm oops message correctly it oopsed inside >>>>>> __srcu_read_lock. there is actually nothing in there that can oops, >>>>>> except the access to the preempt count. I am just guessing right now, >>>>>> but maybe the preempt variable is no longer available (as the process >>>>>> is gone). As long as a debugfs file is open, we hold a reference to >>>>>> the kvm, which holds a reference to the mm, so the mm might be killed >>>>>> after the process. But this is supposed to work, so maybe its something >>>>>> different. An objdump of __srcu_read_lock might help. >>>>> >>>>> Hmm, the preempt thing is done in srcu_read_lock, but the crash is in >>>>> __srcu_read_lock. This function gets the srcu struct from mmu_notifier.c, >>>>> which must be present and is initialized during boot. >>>>> >>>>> >>>>> int __srcu_read_lock(struct srcu_struct *sp) >>>>> { >>>>> int idx; >>>>> >>>>> idx = READ_ONCE(sp->completed) & 0x1; >>>>> __this_cpu_inc(sp->per_cpu_ref->c[idx]); >>>>> smp_mb(); /* B */ /* Avoid leaking the critical section. */ >>>>> __this_cpu_inc(sp->per_cpu_ref->seq[idx]); >>>>> return idx; >>>>> } >>>>> >>>>> Looking at the code I have no clue why the patch does make a difference. >>>>> Can you try to get an objdump -S for__Srcu_read_lock? >>> >>> Some other interesting finding below... >>> >>> On the host, I do _not_ have any nodes under /sys/kernel/debug/kvm/ >>> >>> Running strace on the qemu command I use to launch the guest yields >>> the following. >>> >>> [pid 5963] 1448649724.405537 mmap(NULL, 65536, PROT_READ|PROT_WRITE, >>> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6652a000 >>> [pid 5963] 1448649724.405586 read(13, "MemTotal: 16414616 >>> kB\nMemF"..., 1024) = 1024 >>> [pid 5963] 1448649724.405699 close(13) = 0 >>> [pid 5963] 1448649724.405755 munmap(0x7f6652a000, 65536) = 0 >>> [pid 5963] 1448649724.405947 brk(0x2552f000) = 0x2552f000 >>> [pid 5963] 1448649724.406148 openat(AT_FDCWD, "/dev/kvm", >>> O_RDWR|O_CLOEXEC) = 13 >>> [pid 5963] 1448649724.406209 ioctl(13, KVM_CREATE_VM, 0) = -1 ENOMEM >>> (Cannot allocate memory) >> >> If I comment the call to kvm_create_vm_debugfs(kvm) the guest boots >> fine. I put some printk's in the kvm_create_vm_debugfs() function and >> it's returning -ENOMEM after it evaluates !kvm->debugfs_dentry. I was >> chatting with some folks from the Linaro virtualization team and they >> mentioned that ARM is a bit special as the same PID creates two vms in >> quick succession, the first one is a scratch vm, and the other is the >> 'real' vm. With that bit of info, I suspect we may be trying to create >> the debugfs directory twice, and the second time it's failing because >> it already exists. > > Hmmm, with a patched QEMU that calls VM_CREATE twice it errors out on s390 > with -ENOMEM (which it should not), but it errors out gracefully. > > Does the attached patch avoid the crash? (guest will not start, but qemu > should error out gracefully with ENOMEM) Yeah. I patched my host kernel and now the qemu guest launch errors gracefully[1]. Cheers, Tyler [1] http://hastebin.com/rotiropayo.mel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 30/11/2015 09:38, Christian Borntraeger wrote: > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index f7d6c8f..b26472a 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -671,12 +671,16 @@ static struct kvm *kvm_create_vm(unsigned long type) > > r = kvm_create_vm_debugfs(kvm); > if (r) > - goto out_err; > + goto out_mmu; > > preempt_notifier_inc(); > > return kvm; > > +out_mmu: > +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) > + mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm); > +#endif > out_err: > cleanup_srcu_struct(&kvm->irq_srcu); > out_err_no_irq_srcu: Looking at old unread email in my inbox... can you resubmit this to kvm@vger.kernel.org as a proper patch (Signed-off-by, commit message, etc.)? Thanks, Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/02/2016 06:15 PM, Paolo Bonzini wrote: > > > On 30/11/2015 09:38, Christian Borntraeger wrote: >> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index f7d6c8f..b26472a 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -671,12 +671,16 @@ static struct kvm *kvm_create_vm(unsigned long type) >> >> r = kvm_create_vm_debugfs(kvm); >> if (r) >> - goto out_err; >> + goto out_mmu; >> >> preempt_notifier_inc(); >> >> return kvm; >> >> +out_mmu: >> +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) >> + mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm); >> +#endif >> out_err: >> cleanup_srcu_struct(&kvm->irq_srcu); >> out_err_no_irq_srcu: > > Looking at old unread email in my inbox... can you resubmit this to > kvm@vger.kernel.org as a proper patch (Signed-off-by, commit message, etc.)? > This patch was only necessary to fixup an earlier version of Janosch's patch. upstream is fine. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f7d6c8f..b26472a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -671,12 +671,16 @@ static struct kvm *kvm_create_vm(unsigned long type) r = kvm_create_vm_debugfs(kvm); if (r) - goto out_err; + goto out_mmu; preempt_notifier_inc(); return kvm; +out_mmu: +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) + mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm); +#endif out_err: cleanup_srcu_struct(&kvm->irq_srcu); out_err_no_irq_srcu: