From patchwork Tue Oct 3 13:36:51 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boqun Feng X-Patchwork-Id: 9982847 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6795D60375 for ; Tue, 3 Oct 2017 13:36:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 59A7F28755 for ; Tue, 3 Oct 2017 13:36:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4DA382875B; Tue, 3 Oct 2017 13:36:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E428E28755 for ; Tue, 3 Oct 2017 13:36:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751894AbdJCNgG (ORCPT ); Tue, 3 Oct 2017 09:36:06 -0400 Received: from mail-pf0-f195.google.com ([209.85.192.195]:36914 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751345AbdJCNgE (ORCPT ); Tue, 3 Oct 2017 09:36:04 -0400 Received: by mail-pf0-f195.google.com with SMTP id e69so9142356pfg.4; Tue, 03 Oct 2017 06:36:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=XhR4JoAjDRqwGp6bRmqYbfTj9zHxfsQNhLPR2nAbA3I=; b=q4FkuuConk5aOVqAnaEMjpp17cZmr1o9wVZipoinAwAANAPLkM3UKO8JxT2NUw9CM2 Vh42cFnVk7stv8ZyPiokk36RChfVeVuqbfERXhLnHmKZRMxoDe4JoPoMa6bwJiayS6+M LD48JIL8j1iI+AVrkm2ZauIX0Jj+zBXOsvf8x4A2rxAyqwz2vKGe2xNQ4EfYe6zDULkq yO1kekI0uEz33f70JinwYzkrZixceJr2PoNJ5Aqyl9k9vIS69ktd+acVaBOYiam4iAgp mQ1OaUtReJex7m+G4rrazsqvdMgFph/CRoDFffmwqKIHS65et6m+VxUJ9oCRK74O1IG/ Id3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=XhR4JoAjDRqwGp6bRmqYbfTj9zHxfsQNhLPR2nAbA3I=; b=O6Lqv4/JDDnOmVDGjGnSyPa1I/M4WNMZ/lzoTYh4Wci1tdILU2WVjErzJ66eiGYL43 1/qw29G3R4bJ25WRAlSx9Q80v1y5NGXZ70Ql+hqyye44lcipaVd6SGSDGwqE8hQUqLKW dwTAKhyPeytLBqhR9Cg8mEBwFBbXU7Xue2pttQWCuY4bGNUiARxkjY1zqfPag+d7bd3j EbYJImLsfz+EjGeOPxFOE50Z+6FUSRVOFxUI+Tf/3gDUI4jvlCwSLlx2CHaAi5SMGq+3 K29/lRmreQm7AzTeOcfmoJCGwEViHPQvuIMnZV0IYjPKFOKx5Nm+TNFr5hIWyfExTjdH xmYw== X-Gm-Message-State: AHPjjUgfPgGG9v48EDAsHO0iXinKMUNveaFIwT4M95mUKEJFYQgwcGRx Bo5/EngQmPj+bYHX88Nih4ARtwj4 X-Google-Smtp-Source: AOwi7QDLWWe79dNEZeg+rxrXTFeQ/axpFptMdqHw4fbfyY3dUfWlv6svCRvL+lmjj75b1W8DN8Rl6w== X-Received: by 10.99.43.4 with SMTP id r4mr15552087pgr.380.1507037763349; Tue, 03 Oct 2017 06:36:03 -0700 (PDT) Received: from localhost ([45.32.128.109]) by smtp.gmail.com with ESMTPSA id v41sm14859657pgn.44.2017.10.03.06.36.02 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 03 Oct 2017 06:36:02 -0700 (PDT) From: Boqun Feng To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Boqun Feng , "Paul E. McKenney" , Peter Zijlstra , Wanpeng Li , Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org Subject: [PATCH] kvm/x86: Avoid async PF to end RCU read-side critical section early in PREEMPT=n kernel Date: Tue, 3 Oct 2017 21:36:51 +0800 Message-Id: <20171003133653.1178-1-boqun.feng@gmail.com> X-Mailer: git-send-email 2.14.1 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently, in PREEMPT=n kernel, kvm_async_pf_task_wait() could call schedule() to reschedule in some cases, which could result in accidentally ending the current RCU read-side critical section early. And this could end up with random memory corruption in the guest. The difficulty to handle this well is because we don't know whether an async PF delivered in a RCU read-side critical section for PREEMPT_COUNT=n kernel, since rcu_read_lock/unlock() are just no-ops in that case. To cure this, we treat any async PF interrupting a kernel context as one delivered in a RCU read-side critical section, and we don't allow kvm_async_pf_task_wait() to choose schedule path in that case for PREEMPT_COUNT=n kernel, because that will introduce unvolunteerly context switches and break the assumption for RCU to work properly. To do so, a second parameter for kvm_async_pf_task_wait() is introduced, so that we know whether it's called from a context interrupting the kernel, and we set that parameter properly in all the callsites. Cc: "Paul E. McKenney" Cc: Peter Zijlstra Cc: Wanpeng Li Signed-off-by: Boqun Feng Signed-off-by: Boqun Feng --- arch/x86/include/asm/kvm_para.h | 4 ++-- arch/x86/kernel/kvm.c | 14 ++++++++++---- arch/x86/kvm/mmu.c | 2 +- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index bc62e7cbf1b1..59ad3d132353 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -88,7 +88,7 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1, bool kvm_para_available(void); unsigned int kvm_arch_para_features(void); void __init kvm_guest_init(void); -void kvm_async_pf_task_wait(u32 token); +void kvm_async_pf_task_wait(u32 token, int interrupt_kernel); void kvm_async_pf_task_wake(u32 token); u32 kvm_read_and_reset_pf_reason(void); extern void kvm_disable_steal_time(void); @@ -103,7 +103,7 @@ static inline void kvm_spinlock_init(void) #else /* CONFIG_KVM_GUEST */ #define kvm_guest_init() do {} while (0) -#define kvm_async_pf_task_wait(T) do {} while(0) +#define kvm_async_pf_task_wait(T, I) do {} while(0) #define kvm_async_pf_task_wake(T) do {} while(0) static inline bool kvm_para_available(void) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index e675704fa6f7..8bb9594d0761 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -117,7 +117,11 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b, return NULL; } -void kvm_async_pf_task_wait(u32 token) +/* + * @interrupt_kernel: Is this called from a routine which interrupts the kernel + * (other than user space)? + */ +void kvm_async_pf_task_wait(u32 token, int interrupt_kernel) { u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); struct kvm_task_sleep_head *b = &async_pf_sleepers[key]; @@ -140,8 +144,10 @@ void kvm_async_pf_task_wait(u32 token) n.token = token; n.cpu = smp_processor_id(); - n.halted = is_idle_task(current) || preempt_count() > 1 || - rcu_preempt_depth(); + n.halted = is_idle_task(current) || + (IS_ENABLED(CONFIG_PREEMPT_COUNT) + ? preempt_count() > 1 || rcu_preempt_depth() + : interrupt_kernel); init_swait_queue_head(&n.wq); hlist_add_head(&n.link, &b->list); raw_spin_unlock(&b->lock); @@ -269,7 +275,7 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code) case KVM_PV_REASON_PAGE_NOT_PRESENT: /* page is swapped out by the host. */ prev_state = exception_enter(); - kvm_async_pf_task_wait((u32)read_cr2()); + kvm_async_pf_task_wait((u32)read_cr2(), !user_mode(regs)); exception_exit(prev_state); break; case KVM_PV_REASON_PAGE_READY: diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index eca30c1eb1d9..106d4a029a8a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3837,7 +3837,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, case KVM_PV_REASON_PAGE_NOT_PRESENT: vcpu->arch.apf.host_apf_reason = 0; local_irq_disable(); - kvm_async_pf_task_wait(fault_address); + kvm_async_pf_task_wait(fault_address, 0); local_irq_enable(); break; case KVM_PV_REASON_PAGE_READY: