From patchwork Mon Jun 6 14:19:09 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chen Yu X-Patchwork-Id: 9158227 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 701E560572 for ; Mon, 6 Jun 2016 14:12:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6004C2656B for ; Mon, 6 Jun 2016 14:12:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 54B5726E5D; Mon, 6 Jun 2016 14:12:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA1582656B for ; Mon, 6 Jun 2016 14:12:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751337AbcFFOMU (ORCPT ); Mon, 6 Jun 2016 10:12:20 -0400 Received: from mga09.intel.com ([134.134.136.24]:31092 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752716AbcFFOMR (ORCPT ); Mon, 6 Jun 2016 10:12:17 -0400 Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP; 06 Jun 2016 07:11:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,427,1459839600"; d="scan'208";a="996192539" Received: from sharon.sh.intel.com ([10.239.160.87]) by fmsmga002.fm.intel.com with ESMTP; 06 Jun 2016 07:11:47 -0700 From: Chen Yu To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , linux-pm@vger.kernel.org, "Rafael J . Wysocki" , Pavel Machek , Len Brown , Borislav Petkov , Peter Zijlstra , Zhu Guihua , Juergen Gross , Chen Yu Subject: [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation Date: Mon, 6 Jun 2016 22:19:09 +0800 Message-Id: <1465222749-8388-1-git-send-email-yu.c.chen@intel.com> X-Mailer: git-send-email 2.7.4 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Stress test from Varun Koyyalagunta reports that, the nonboot CPU would hang occasionally, when resuming from hibernation. Further investigation shows that, the precise phase when nonboot CPU hangs, is the time when the nonboot CPU been woken up incorrectly, and tries to monitor the mwait_ptr for the second time, then an exception is triggered due to illegal vaddr access, say, something like, 'Unable to handler kernel address of 0xffff8800ba800010...' One of the possible scenarios for this issue is illustrated below, when the boot CPU tries to resume from hibernation: 1. puts the nonboot CPUs offline, so the nonboot CPUs are monitoring at the address of the task_struct.flags. 2. boot CPU copies pages to their original address, which includes task_struct.flags, thus wakes up one of the nonboot CPUs. 3. nonboot CPU tries to monitor the task_struct.flags again, but since the page table for task_struct.flags has been overwritten by boot CPU, and there is probably a changed across hibernation (because of inconsistence of e820 memory map), an exception is triggered. As suggested by Rafael and Len, this patch tries to monitor a zero page instead of task_struct.flags, if it comes from hibernation resume process. The zero page should be safe because it is located in .bss and page table for kernel mapping of text/data/bss should keeps unchanged according to hibernation semantic. Reported-by: Varun Koyyalagunta Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=106371 Signed-off-by: Chen Yu --- arch/x86/kernel/smpboot.c | 16 +++++++++++++++- include/linux/suspend.h | 7 +++++++ kernel/power/hibernate.c | 3 +++ 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index fafe8b9..b2732ae 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include @@ -1595,8 +1596,21 @@ static inline void mwait_play_dead(void) * This should be a memory location in a cache line which is * unlikely to be touched by other processors. The actual * content is immaterial as it is not actually modified in any way. + * + * However in hibernation resume process, this address could be + * touched by BSP when restoring page frames, if the page table + * for this address is not coherent across hibernation(due to + * inconsistence of e820 memory map), access from APs might + * cause exception. So change the mwait address to zero page, + * which is located in .bss, in this way we can avoid illegal + * access from APs because page table for kernel mapping + * of text/data/bss should keeps unchanged according to + * hibernation semantic. */ - mwait_ptr = ¤t_thread_info()->flags; + if (hibernation_in_resume()) + mwait_ptr = empty_zero_page; + else + mwait_ptr = ¤t_thread_info()->flags; wbinvd(); diff --git a/include/linux/suspend.h b/include/linux/suspend.h index 8b6ec7e..422e87a 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -384,6 +384,12 @@ extern bool system_entering_hibernation(void); extern bool hibernation_available(void); asmlinkage int swsusp_save(void); extern struct pbe *restore_pblist; +extern bool in_resume_hibernate; + +static inline bool hibernation_in_resume(void) +{ + return in_resume_hibernate; +} #else /* CONFIG_HIBERNATION */ static inline void register_nosave_region(unsigned long b, unsigned long e) {} static inline void register_nosave_region_late(unsigned long b, unsigned long e) {} @@ -395,6 +401,7 @@ static inline void hibernation_set_ops(const struct platform_hibernation_ops *op static inline int hibernate(void) { return -ENOSYS; } static inline bool system_entering_hibernation(void) { return false; } static inline bool hibernation_available(void) { return false; } +static inline bool hibernation_in_resume(void) { return false; } #endif /* CONFIG_HIBERNATION */ /* Hibernation and suspend events */ diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index fca9254..13c229a 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -43,6 +43,7 @@ static char resume_file[256] = CONFIG_PM_STD_PARTITION; dev_t swsusp_resume_device; sector_t swsusp_resume_block; __visible int in_suspend __nosavedata; +bool in_resume_hibernate; enum { HIBERNATION_INVALID, @@ -433,7 +434,9 @@ static int resume_target_kernel(bool platform_mode) if (error) goto Cleanup; + in_resume_hibernate = true; error = disable_nonboot_cpus(); + in_resume_hibernate = false; if (error) goto Enable_cpus;