From patchwork Sat Apr 17 09:40:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miaohe Lin X-Patchwork-Id: 12209613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50FA5C43461 for ; Sat, 17 Apr 2021 09:41:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DB6E061026 for ; Sat, 17 Apr 2021 09:41:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DB6E061026 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2BE196B0070; Sat, 17 Apr 2021 05:41:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 295496B0071; Sat, 17 Apr 2021 05:41:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 15CC46B0072; Sat, 17 Apr 2021 05:41:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0230.hostedemail.com [216.40.44.230]) by kanga.kvack.org (Postfix) with ESMTP id E52086B0070 for ; Sat, 17 Apr 2021 05:41:49 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id A92208248D51 for ; Sat, 17 Apr 2021 09:41:49 +0000 (UTC) X-FDA: 78041367138.04.A6EC4E6 Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) by imf11.hostedemail.com (Postfix) with ESMTP id CD7DE2000254 for ; Sat, 17 Apr 2021 09:41:37 +0000 (UTC) Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4FMp2b23DGzlYHb; Sat, 17 Apr 2021 17:39:51 +0800 (CST) Received: from huawei.com (10.175.104.175) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.498.0; Sat, 17 Apr 2021 17:41:37 +0800 From: Miaohe Lin To: CC: , , , , , , , , , , , , , Subject: [PATCH v2 1/5] mm/swapfile: add percpu_ref support for swap Date: Sat, 17 Apr 2021 05:40:35 -0400 Message-ID: <20210417094039.51711-2-linmiaohe@huawei.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20210417094039.51711-1-linmiaohe@huawei.com> References: <20210417094039.51711-1-linmiaohe@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.104.175] X-CFilter-Loop: Reflected X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: CD7DE2000254 X-Stat-Signature: gnxf4chiby9pnem1w6x5ojnq6575e7e3 Received-SPF: none (huawei.com>: No applicable sender policy available) receiver=imf11; identity=mailfrom; envelope-from=""; helo=szxga06-in.huawei.com; client-ip=45.249.212.32 X-HE-DKIM-Result: none/none X-HE-Tag: 1618652497-231372 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We will use percpu-refcount to serialize against concurrent swapoff. This patch adds the percpu_ref support for swap. Signed-off-by: Miaohe Lin --- include/linux/swap.h | 3 +++ mm/swapfile.c | 33 +++++++++++++++++++++++++++++---- 2 files changed, 32 insertions(+), 4 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 144727041e78..8be36eb58b7a 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -240,6 +240,7 @@ struct swap_cluster_list { * The in-memory structure used to track swap areas. */ struct swap_info_struct { + struct percpu_ref users; /* serialization against concurrent swapoff */ unsigned long flags; /* SWP_USED etc: see above */ signed short prio; /* swap priority of this type */ struct plist_node list; /* entry in swap_active_head */ @@ -260,6 +261,8 @@ struct swap_info_struct { struct block_device *bdev; /* swap device or bdev of swap file */ struct file *swap_file; /* seldom referenced */ unsigned int old_block_size; /* seldom referenced */ + bool ref_initialized; /* seldom referenced */ + struct completion comp; /* seldom referenced */ #ifdef CONFIG_FRONTSWAP unsigned long *frontswap_map; /* frontswap in-use, one bit per page */ atomic_t frontswap_pages; /* frontswap pages in-use counter */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 149e77454e3c..66515a3a2824 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -511,6 +512,14 @@ static void swap_discard_work(struct work_struct *work) spin_unlock(&si->lock); } +static void swap_users_ref_free(struct percpu_ref *ref) +{ + struct swap_info_struct *si; + + si = container_of(ref, struct swap_info_struct, users); + complete(&si->comp); +} + static void alloc_cluster(struct swap_info_struct *si, unsigned long idx) { struct swap_cluster_info *ci = si->cluster_info; @@ -2500,7 +2509,7 @@ static void enable_swap_info(struct swap_info_struct *p, int prio, * Guarantee swap_map, cluster_info, etc. fields are valid * between get/put_swap_device() if SWP_VALID bit is set */ - synchronize_rcu(); + percpu_ref_resurrect(&p->users); spin_lock(&swap_lock); spin_lock(&p->lock); _enable_swap_info(p); @@ -2621,11 +2630,18 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) p->flags &= ~SWP_VALID; /* mark swap device as invalid */ spin_unlock(&p->lock); spin_unlock(&swap_lock); + + percpu_ref_kill(&p->users); /* - * wait for swap operations protected by get/put_swap_device() - * to complete + * We need synchronize_rcu() here to protect the accessing + * to the swap cache data structure. */ synchronize_rcu(); + /* + * Wait for swap operations protected by get/put_swap_device() + * to complete. + */ + wait_for_completion(&p->comp); flush_work(&p->discard_work); @@ -3132,7 +3148,7 @@ static bool swap_discardable(struct swap_info_struct *si) SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) { struct swap_info_struct *p; - struct filename *name; + struct filename *name = NULL; struct file *swap_file = NULL; struct address_space *mapping; int prio; @@ -3163,6 +3179,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) INIT_WORK(&p->discard_work, swap_discard_work); + if (!p->ref_initialized) { + error = percpu_ref_init(&p->users, swap_users_ref_free, + PERCPU_REF_INIT_DEAD, GFP_KERNEL); + if (unlikely(error)) + goto bad_swap; + init_completion(&p->comp); + p->ref_initialized = true; + } + name = getname(specialfile); if (IS_ERR(name)) { error = PTR_ERR(name); From patchwork Sat Apr 17 09:40:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miaohe Lin X-Patchwork-Id: 12209615 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67CB1C43460 for ; Sat, 17 Apr 2021 09:41:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B8914611EF for ; Sat, 17 Apr 2021 09:41:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B8914611EF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B71916B006E; Sat, 17 Apr 2021 05:41:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B00556B0070; Sat, 17 Apr 2021 05:41:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 99A396B0071; Sat, 17 Apr 2021 05:41:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0154.hostedemail.com [216.40.44.154]) by kanga.kvack.org (Postfix) with ESMTP id 7730C6B0070 for ; Sat, 17 Apr 2021 05:41:49 -0400 (EDT) Received: from smtpin37.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2C4AE180318E4 for ; Sat, 17 Apr 2021 09:41:49 +0000 (UTC) X-FDA: 78041367138.37.873B9D6 Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) by imf26.hostedemail.com (Postfix) with ESMTP id 31CFF40002C6 for ; Sat, 17 Apr 2021 09:41:43 +0000 (UTC) Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4FMp2b2Sy1zlYMq; Sat, 17 Apr 2021 17:39:51 +0800 (CST) Received: from huawei.com (10.175.104.175) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.498.0; Sat, 17 Apr 2021 17:41:38 +0800 From: Miaohe Lin To: CC: , , , , , , , , , , , , , Subject: [PATCH v2 2/5] mm/swapfile: use percpu_ref to serialize against concurrent swapoff Date: Sat, 17 Apr 2021 05:40:36 -0400 Message-ID: <20210417094039.51711-3-linmiaohe@huawei.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20210417094039.51711-1-linmiaohe@huawei.com> References: <20210417094039.51711-1-linmiaohe@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.104.175] X-CFilter-Loop: Reflected X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 31CFF40002C6 X-Stat-Signature: bpnibnt38aa6edffmp3i1h9fmtmr1sk8 Received-SPF: none (huawei.com>: No applicable sender policy available) receiver=imf26; identity=mailfrom; envelope-from=""; helo=szxga06-in.huawei.com; client-ip=45.249.212.32 X-HE-DKIM-Result: none/none X-HE-Tag: 1618652503-458781 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Use percpu_ref to serialize against concurrent swapoff. Also remove the SWP_VALID flag because it's used together with RCU solution. Signed-off-by: Miaohe Lin --- include/linux/swap.h | 3 +-- mm/swapfile.c | 43 +++++++++++++++++-------------------------- 2 files changed, 18 insertions(+), 28 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 8be36eb58b7a..993693b38109 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -177,7 +177,6 @@ enum { SWP_PAGE_DISCARD = (1 << 10), /* freed swap page-cluster discards */ SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */ SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */ - SWP_VALID = (1 << 13), /* swap is valid to be operated on? */ /* add others here before... */ SWP_SCANNING = (1 << 14), /* refcount in scan_swap_map */ }; @@ -514,7 +513,7 @@ sector_t swap_page_sector(struct page *page); static inline void put_swap_device(struct swap_info_struct *si) { - rcu_read_unlock(); + percpu_ref_put(&si->users); } #else /* CONFIG_SWAP */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 66515a3a2824..90e197bc2eeb 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1279,18 +1279,12 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *p, * via preventing the swap device from being swapoff, until * put_swap_device() is called. Otherwise return NULL. * - * The entirety of the RCU read critical section must come before the - * return from or after the call to synchronize_rcu() in - * enable_swap_info() or swapoff(). So if "si->flags & SWP_VALID" is - * true, the si->map, si->cluster_info, etc. must be valid in the - * critical section. - * * Notice that swapoff or swapoff+swapon can still happen before the - * rcu_read_lock() in get_swap_device() or after the rcu_read_unlock() - * in put_swap_device() if there isn't any other way to prevent - * swapoff, such as page lock, page table lock, etc. The caller must - * be prepared for that. For example, the following situation is - * possible. + * percpu_ref_tryget_live() in get_swap_device() or after the + * percpu_ref_put() in put_swap_device() if there isn't any other way + * to prevent swapoff, such as page lock, page table lock, etc. The + * caller must be prepared for that. For example, the following + * situation is possible. * * CPU1 CPU2 * do_swap_page() @@ -1318,21 +1312,24 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) si = swp_swap_info(entry); if (!si) goto bad_nofile; - - rcu_read_lock(); - if (data_race(!(si->flags & SWP_VALID))) - goto unlock_out; + if (!percpu_ref_tryget_live(&si->users)) + goto out; + /* + * Guarantee we will not reference uninitialized fields + * of swap_info_struct. + */ + smp_rmb(); offset = swp_offset(entry); if (offset >= si->max) - goto unlock_out; + goto put_out; return si; bad_nofile: pr_err("%s: %s%08lx\n", __func__, Bad_file, entry.val); out: return NULL; -unlock_out: - rcu_read_unlock(); +put_out: + percpu_ref_put(&si->users); return NULL; } @@ -2475,7 +2472,7 @@ static void setup_swap_info(struct swap_info_struct *p, int prio, static void _enable_swap_info(struct swap_info_struct *p) { - p->flags |= SWP_WRITEOK | SWP_VALID; + p->flags |= SWP_WRITEOK; atomic_long_add(p->pages, &nr_swap_pages); total_swap_pages += p->pages; @@ -2507,7 +2504,7 @@ static void enable_swap_info(struct swap_info_struct *p, int prio, spin_unlock(&swap_lock); /* * Guarantee swap_map, cluster_info, etc. fields are valid - * between get/put_swap_device() if SWP_VALID bit is set + * between get/put_swap_device(). */ percpu_ref_resurrect(&p->users); spin_lock(&swap_lock); @@ -2625,12 +2622,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) reenable_swap_slots_cache_unlock(); - spin_lock(&swap_lock); - spin_lock(&p->lock); - p->flags &= ~SWP_VALID; /* mark swap device as invalid */ - spin_unlock(&p->lock); - spin_unlock(&swap_lock); - percpu_ref_kill(&p->users); /* * We need synchronize_rcu() here to protect the accessing From patchwork Sat Apr 17 09:40:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miaohe Lin X-Patchwork-Id: 12209621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EB4CC43461 for ; Sat, 17 Apr 2021 09:42:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DBE6561026 for ; Sat, 17 Apr 2021 09:41:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DBE6561026 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 46DB46B0073; Sat, 17 Apr 2021 05:41:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 41DBC6B0074; Sat, 17 Apr 2021 05:41:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29CA76B0075; Sat, 17 Apr 2021 05:41:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0037.hostedemail.com [216.40.44.37]) by kanga.kvack.org (Postfix) with ESMTP id F2BF46B0073 for ; Sat, 17 Apr 2021 05:41:54 -0400 (EDT) Received: from smtpin36.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B1F55180ACF7F for ; Sat, 17 Apr 2021 09:41:54 +0000 (UTC) X-FDA: 78041367348.36.03F2569 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf30.hostedemail.com (Postfix) with ESMTP id 96F9DE00010B for ; Sat, 17 Apr 2021 09:41:40 +0000 (UTC) Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4FMp2C1LWnzyPLG; Sat, 17 Apr 2021 17:39:31 +0800 (CST) Received: from huawei.com (10.175.104.175) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.498.0; Sat, 17 Apr 2021 17:41:39 +0800 From: Miaohe Lin To: CC: , , , , , , , , , , , , , Subject: [PATCH v2 3/5] swap: fix do_swap_page() race with swapoff Date: Sat, 17 Apr 2021 05:40:37 -0400 Message-ID: <20210417094039.51711-4-linmiaohe@huawei.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20210417094039.51711-1-linmiaohe@huawei.com> References: <20210417094039.51711-1-linmiaohe@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.104.175] X-CFilter-Loop: Reflected X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 96F9DE00010B X-Stat-Signature: sbjbjf86b1w1us6szqm3gpp9xa333hcm Received-SPF: none (huawei.com>: No applicable sender policy available) receiver=imf30; identity=mailfrom; envelope-from=""; helo=szxga05-in.huawei.com; client-ip=45.249.212.191 X-HE-DKIM-Result: none/none X-HE-Tag: 1618652500-286281 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When I was investigating the swap code, I found the below possible race window: CPU 1 CPU 2 ----- ----- do_swap_page swap_readpage(skip swap cache case) if (data_race(sis->flags & SWP_FS_OPS)) { swapoff p->flags = &= ~SWP_VALID; .. synchronize_rcu(); .. p->swap_file = NULL; struct file *swap_file = sis->swap_file; struct address_space *mapping = swap_file->f_mapping;[oops!] Note that for the pages that are swapped in through swap cache, this isn't an issue. Because the page is locked, and the swap entry will be marked with SWAP_HAS_CACHE, so swapoff() can not proceed until the page has been unlocked. Using current get/put_swap_device() to guard against concurrent swapoff for swap_readpage() looks terrible because swap_readpage() may take really long time. And this race may not be really pernicious because swapoff is usually done when system shutdown only. To reduce the performance overhead on the hot-path as much as possible, it appears we can use the percpu_ref to close this race window(as suggested by Huang, Ying). Fixes: 0bcac06f27d7 ("mm,swap: skip swapcache for swapin of synchronous device") Reported-by: kernel test robot (auto build test ERROR) Signed-off-by: Miaohe Lin --- include/linux/swap.h | 9 +++++++++ mm/memory.c | 9 +++++++++ 2 files changed, 18 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index 993693b38109..523c2411a135 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -528,6 +528,15 @@ static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry) return NULL; } +static inline struct swap_info_struct *get_swap_device(swp_entry_t entry) +{ + return NULL; +} + +static inline void put_swap_device(struct swap_info_struct *si) +{ +} + #define swap_address_space(entry) (NULL) #define get_nr_swap_pages() 0L #define total_swap_pages 0L diff --git a/mm/memory.c b/mm/memory.c index 27014c3bde9f..7a2fe12cf641 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3311,6 +3311,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct page *page = NULL, *swapcache; + struct swap_info_struct *si = NULL; swp_entry_t entry; pte_t pte; int locked; @@ -3338,6 +3339,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out; } + /* Prevent swapoff from happening to us. */ + si = get_swap_device(entry); + if (unlikely(!si)) + goto out; delayacct_set_flag(current, DELAYACCT_PF_SWAPIN); page = lookup_swap_cache(entry, vma, vmf->address); @@ -3514,6 +3519,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); out: + if (si) + put_swap_device(si); return ret; out_nomap: pte_unmap_unlock(vmf->pte, vmf->ptl); @@ -3525,6 +3532,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) unlock_page(swapcache); put_page(swapcache); } + if (si) + put_swap_device(si); return ret; } From patchwork Sat Apr 17 09:40:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miaohe Lin X-Patchwork-Id: 12209619 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26839C433ED for ; Sat, 17 Apr 2021 09:41:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CDF7F611AC for ; Sat, 17 Apr 2021 09:41:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CDF7F611AC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 297E46B0071; Sat, 17 Apr 2021 05:41:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2742B6B0073; Sat, 17 Apr 2021 05:41:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E817B6B0075; Sat, 17 Apr 2021 05:41:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0164.hostedemail.com [216.40.44.164]) by kanga.kvack.org (Postfix) with ESMTP id B083A6B0071 for ; Sat, 17 Apr 2021 05:41:53 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2D61D4DDB for ; Sat, 17 Apr 2021 09:41:53 +0000 (UTC) X-FDA: 78041367306.26.9F6C2EF Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf02.hostedemail.com (Postfix) with ESMTP id 8F1F740002C8 for ; Sat, 17 Apr 2021 09:41:33 +0000 (UTC) Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4FMp2C0yb9zyPGg; Sat, 17 Apr 2021 17:39:31 +0800 (CST) Received: from huawei.com (10.175.104.175) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.498.0; Sat, 17 Apr 2021 17:41:39 +0800 From: Miaohe Lin To: CC: , , , , , , , , , , , , , Subject: [PATCH v2 4/5] mm/swap: remove confusing checking for non_swap_entry() in swap_ra_info() Date: Sat, 17 Apr 2021 05:40:38 -0400 Message-ID: <20210417094039.51711-5-linmiaohe@huawei.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20210417094039.51711-1-linmiaohe@huawei.com> References: <20210417094039.51711-1-linmiaohe@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.104.175] X-CFilter-Loop: Reflected X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 8F1F740002C8 X-Stat-Signature: r9rjhubouq8a89o3efnu34hmch5rbcgq Received-SPF: none (huawei.com>: No applicable sender policy available) receiver=imf02; identity=mailfrom; envelope-from=""; helo=szxga05-in.huawei.com; client-ip=45.249.212.191 X-HE-DKIM-Result: none/none X-HE-Tag: 1618652493-126388 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: While we released the pte lock, somebody else might faulted in this pte. So we should check whether it's swap pte first to guard against such race or swp_type would be unexpected. But the swap_entry isn't used in this function and we will have enough checking when we really operate the PTE entries later. So checking for non_swap_entry() is not really needed here and should be removed to avoid confusion. Signed-off-by: Miaohe Lin --- mm/swap_state.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 272ea2108c9d..df5405384520 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -721,7 +721,6 @@ static void swap_ra_info(struct vm_fault *vmf, { struct vm_area_struct *vma = vmf->vma; unsigned long ra_val; - swp_entry_t entry; unsigned long faddr, pfn, fpfn; unsigned long start, end; pte_t *pte, *orig_pte; @@ -739,11 +738,6 @@ static void swap_ra_info(struct vm_fault *vmf, faddr = vmf->address; orig_pte = pte = pte_offset_map(vmf->pmd, faddr); - entry = pte_to_swp_entry(*pte); - if ((unlikely(non_swap_entry(entry)))) { - pte_unmap(orig_pte); - return; - } fpfn = PFN_DOWN(faddr); ra_val = GET_SWAP_RA_VAL(vma); From patchwork Sat Apr 17 09:40:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miaohe Lin X-Patchwork-Id: 12209617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BC35C43470 for ; Sat, 17 Apr 2021 09:41:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D8A6A61026 for ; Sat, 17 Apr 2021 09:41:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D8A6A61026 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D47B26B0072; Sat, 17 Apr 2021 05:41:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D1F8C6B0074; Sat, 17 Apr 2021 05:41:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFB146B0073; Sat, 17 Apr 2021 05:41:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8555A6B0071 for ; Sat, 17 Apr 2021 05:41:53 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2FC8552A8 for ; Sat, 17 Apr 2021 09:41:53 +0000 (UTC) X-FDA: 78041367306.32.03A6ADB Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf29.hostedemail.com (Postfix) with ESMTP id 7E7E0D6 for ; Sat, 17 Apr 2021 09:41:50 +0000 (UTC) Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4FMp2C0YK7zyPBG; Sat, 17 Apr 2021 17:39:31 +0800 (CST) Received: from huawei.com (10.175.104.175) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.498.0; Sat, 17 Apr 2021 17:41:40 +0800 From: Miaohe Lin To: CC: , , , , , , , , , , , , , Subject: [PATCH v2 5/5] mm/shmem: fix shmem_swapin() race with swapoff Date: Sat, 17 Apr 2021 05:40:39 -0400 Message-ID: <20210417094039.51711-6-linmiaohe@huawei.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20210417094039.51711-1-linmiaohe@huawei.com> References: <20210417094039.51711-1-linmiaohe@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.104.175] X-CFilter-Loop: Reflected X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 7E7E0D6 X-Stat-Signature: y8oe4ikproghn3j88gxs35zqapzyisp4 Received-SPF: none (huawei.com>: No applicable sender policy available) receiver=imf29; identity=mailfrom; envelope-from=""; helo=szxga05-in.huawei.com; client-ip=45.249.212.191 X-HE-DKIM-Result: none/none X-HE-Tag: 1618652510-303622 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When I was investigating the swap code, I found the below possible race window: CPU 1 CPU 2 ----- ----- shmem_swapin swap_cluster_readahead if (likely(si->flags & (SWP_BLKDEV | SWP_FS_OPS))) { swapoff si->flags &= ~SWP_VALID; .. synchronize_rcu(); .. si->swap_file = NULL; struct inode *inode = si->swap_file->f_mapping->host;[oops!] Close this race window by using get/put_swap_device() to guard against concurrent swapoff. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Miaohe Lin --- mm/shmem.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index 26c76b13ad23..936ba5595297 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1492,15 +1492,21 @@ static void shmem_pseudo_vma_destroy(struct vm_area_struct *vma) static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp, struct shmem_inode_info *info, pgoff_t index) { + struct swap_info_struct *si; struct vm_area_struct pvma; struct page *page; struct vm_fault vmf = { .vma = &pvma, }; + /* Prevent swapoff from happening to us. */ + si = get_swap_device(swap); + if (unlikely(!si)) + return NULL; shmem_pseudo_vma_init(&pvma, info, index); page = swap_cluster_readahead(swap, gfp, &vmf); shmem_pseudo_vma_destroy(&pvma); + put_swap_device(si); return page; }