From patchwork Sun Oct 1 01:31:38 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boqun Feng X-Patchwork-Id: 9979617 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8BE7960327 for ; Sun, 1 Oct 2017 01:33:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 73E2328AEA for ; Sun, 1 Oct 2017 01:33:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6830F28B00; Sun, 1 Oct 2017 01:33:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0593C28AEA for ; Sun, 1 Oct 2017 01:33:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751566AbdJABdM (ORCPT ); Sat, 30 Sep 2017 21:33:12 -0400 Received: from mail-pg0-f67.google.com ([74.125.83.67]:35341 "EHLO mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751264AbdJABdK (ORCPT ); Sat, 30 Sep 2017 21:33:10 -0400 Received: by mail-pg0-f67.google.com with SMTP id j16so2271801pga.2; Sat, 30 Sep 2017 18:33:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=J8eHd1drafiGJGWjS+4tRhaY8LhCOnzlz35AI2eKZfM=; b=H8nRRKOroAP6+k4h0hFZxJyDRuQ/Yp7tTPwKxP26Ihg3EitkBUJMoooQJYAv7uSGBY dU9OfpBgIk6Irq4dHmNkowhytgNQqLKbelLdIpe0U30Vd6CXI1+Dsct7HTM2mLhtXWI+ 4/8hlIp8+hGYtguVzZ5XkhiRCdNDRv1RYVUd7DEbqrM9l+E5T0qIpWJBl+MUNlYwMMzB xa0IPzckSAioBbtjMgry/wZ/FQ6OX9L7ZvCbr6WLsERAI55jKjw9S3ZbpJ2izHDgCYqH Emze9YkdHa7us/aZn1jmrW5W5XI1gGlVlc5pLythE5MWxQvwDIYdNFOTq88YbiTusVT0 ZHHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=J8eHd1drafiGJGWjS+4tRhaY8LhCOnzlz35AI2eKZfM=; b=ifSQNMNH831qp2uBR2Rcqe0IIzSm0YaPe2O1Oirv9CWTKJq0SUF8BPH+NYtXuczAzt q46F4R/rzLV1/0m68CTU08crKBub5MfjPY4AhSfNWQMBbYr9xuMQNw6QFezDA7eBgwYR pQF23QdHbUoXOR9A3S0ESaMlCifItPyRIcib8xgePkrj5b/aM6ZGW0gceBf0hYI/g8sw Kq9EAervQy62eE//+MD0bde0tJt4v0Y7RdPbNuK0Ex/8VAcQ/jGEIAJjeVnlP+gMdX+h w3BekwX4gpJLNWDLEfwLLvVcPg/u5+wT3Lbt5bZ7b9r97vEvzKQIRvTiBdv9578n8YSp xR+w== X-Gm-Message-State: AMCzsaV4VxR2rrz+3qu3fxHwS+1YJD9cKmX2iE4HmF2ESNqhK8+MKluT Irq0hXMQg+wcdu3GIKpdF4kCSL5U X-Google-Smtp-Source: AOwi7QD0vpOap4SKVR4efgkmLMFSonMCnotHcbZ8RLziCVXqemzykuLIwt3ex4OqlGAfb0O0KBSa+A== X-Received: by 10.84.229.1 with SMTP id b1mr523048plk.405.1506821590075; Sat, 30 Sep 2017 18:33:10 -0700 (PDT) Received: from localhost ([45.32.128.109]) by smtp.gmail.com with ESMTPSA id h9sm11726737pfi.60.2017.09.30.18.33.07 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 30 Sep 2017 18:33:09 -0700 (PDT) From: Boqun Feng To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Boqun Feng , "Paul E. McKenney" , Peter Zijlstra , Wanpeng Li , Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org Subject: [PATCH v2] kvm/x86: Handle async PF in RCU read-side critical sections Date: Sun, 1 Oct 2017 09:31:38 +0800 Message-Id: <20171001013140.21325-1-boqun.feng@gmail.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20170929110148.3467-1-boqun.feng@gmail.com> References: <20170929110148.3467-1-boqun.feng@gmail.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sasha Levin reported a WARNING: | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329 | rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline] | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329 | rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458 ... | CPU: 0 PID: 6974 Comm: syz-fuzzer Not tainted 4.13.0-next-20170908+ #246 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS | 1.10.1-1ubuntu1 04/01/2014 | Call Trace: ... | RIP: 0010:rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline] | RIP: 0010:rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458 | RSP: 0018:ffff88003b2debc8 EFLAGS: 00010002 | RAX: 0000000000000001 RBX: 1ffff1000765bd85 RCX: 0000000000000000 | RDX: 1ffff100075d7882 RSI: ffffffffb5c7da20 RDI: ffff88003aebc410 | RBP: ffff88003b2def30 R08: dffffc0000000000 R09: 0000000000000001 | R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003b2def08 | R13: 0000000000000000 R14: ffff88003aebc040 R15: ffff88003aebc040 | __schedule+0x201/0x2240 kernel/sched/core.c:3292 | schedule+0x113/0x460 kernel/sched/core.c:3421 | kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158 | do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271 | async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069 | RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996 | RSP: 0018:ffff88003b2df520 EFLAGS: 00010283 | RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670 | RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140 | RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000 | R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8 | R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000 | vsnprintf+0x173/0x1700 lib/vsprintf.c:2136 | sprintf+0xbe/0xf0 lib/vsprintf.c:2386 | proc_self_get_link+0xfb/0x1c0 fs/proc/self.c:23 | get_link fs/namei.c:1047 [inline] | link_path_walk+0x1041/0x1490 fs/namei.c:2127 ... This happened when the host hit a page fault, and delivered it as in an async page fault, while the guest was in an RCU read-side critical section. The guest then tries to reschedule in kvm_async_pf_task_wait(), but rcu_preempt_note_context_switch() would treat the reschedule as a sleep in RCU read-side critical section, which is not allowed (even in preemptible RCU). Thus the WARN. To cure this, we need to make kvm_async_pf_task_wait() go to the halt path(instead of the schedule path) if the PF happens in a RCU read-side critical section. In PREEMPT=y kernel, this is simple, as we record current RCU read-side critical section nested level in rcu_preempt_depth(). But for PREEMPT=n kernel rcu_read_lock/unlock() may be no-ops, so we don't whether we are in a RCU read-side critical section or not. We resolve this by always choosing the halt path in PREEMPT=n kernel unless the guest gets the async PF while running in user mode. Reported-by: Sasha Levin Cc: "Paul E. McKenney" Cc: Peter Zijlstra Cc: Wanpeng Li [The explanation for async PF is contributed by Paolo Bonzini] Signed-off-by: Boqun Feng --- v1 --> v2: * Add more accurate explanation of async PF from Paolo in the commit message. * Extend the kvm_async_pf_task_wait() to have a second parameter @user to indicate whether the fault happens while a user program running the guest. Wanpeng, the callsite of kvm_async_pf_task_wait() in kvm_handle_page_fault() is for nested scenario, right? I take it we should handle it as if the fault happens when l1 guest is running in kernel mode, so @user should be 0, right? arch/x86/include/asm/kvm_para.h | 4 ++-- arch/x86/kernel/kvm.c | 9 ++++++--- arch/x86/kvm/mmu.c | 2 +- 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index bc62e7cbf1b1..0a5ae6bb128b 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -88,7 +88,7 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1, bool kvm_para_available(void); unsigned int kvm_arch_para_features(void); void __init kvm_guest_init(void); -void kvm_async_pf_task_wait(u32 token); +void kvm_async_pf_task_wait(u32 token, int user); void kvm_async_pf_task_wake(u32 token); u32 kvm_read_and_reset_pf_reason(void); extern void kvm_disable_steal_time(void); @@ -103,7 +103,7 @@ static inline void kvm_spinlock_init(void) #else /* CONFIG_KVM_GUEST */ #define kvm_guest_init() do {} while (0) -#define kvm_async_pf_task_wait(T) do {} while(0) +#define kvm_async_pf_task_wait(T, U) do {} while(0) #define kvm_async_pf_task_wake(T) do {} while(0) static inline bool kvm_para_available(void) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index aa60a08b65b1..916f519e54c9 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -117,7 +117,7 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b, return NULL; } -void kvm_async_pf_task_wait(u32 token) +void kvm_async_pf_task_wait(u32 token, int user) { u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); struct kvm_task_sleep_head *b = &async_pf_sleepers[key]; @@ -140,7 +140,10 @@ void kvm_async_pf_task_wait(u32 token) n.token = token; n.cpu = smp_processor_id(); - n.halted = is_idle_task(current) || preempt_count() > 1; + n.halted = is_idle_task(current) || + preempt_count() > 1 || + (!IS_ENABLED(CONFIG_PREEMPT) && !user) || + rcu_preempt_depth(); init_swait_queue_head(&n.wq); hlist_add_head(&n.link, &b->list); raw_spin_unlock(&b->lock); @@ -268,7 +271,7 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code) case KVM_PV_REASON_PAGE_NOT_PRESENT: /* page is swapped out by the host. */ prev_state = exception_enter(); - kvm_async_pf_task_wait((u32)read_cr2()); + kvm_async_pf_task_wait((u32)read_cr2(), user_mode(regs)); exception_exit(prev_state); break; case KVM_PV_REASON_PAGE_READY: diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index eca30c1eb1d9..106d4a029a8a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3837,7 +3837,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, case KVM_PV_REASON_PAGE_NOT_PRESENT: vcpu->arch.apf.host_apf_reason = 0; local_irq_disable(); - kvm_async_pf_task_wait(fault_address); + kvm_async_pf_task_wait(fault_address, 0); local_irq_enable(); break; case KVM_PV_REASON_PAGE_READY: