From patchwork Sun Apr 25 09:54:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miaohe Lin X-Patchwork-Id: 12223195 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DA20C433ED for ; Sun, 25 Apr 2021 09:54:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 20E7261164 for ; Sun, 25 Apr 2021 09:54:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 20E7261164 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D7CC66B006E; Sun, 25 Apr 2021 05:54:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B14876B0072; Sun, 25 Apr 2021 05:54:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7BA126B0074; Sun, 25 Apr 2021 05:54:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id 573486B006E for ; Sun, 25 Apr 2021 05:54:31 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 0A452180AC46F for ; Sun, 25 Apr 2021 09:54:31 +0000 (UTC) X-FDA: 78070429542.20.2015327 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf25.hostedemail.com (Postfix) with ESMTP id E9368600010F for ; Sun, 25 Apr 2021 09:54:25 +0000 (UTC) Received: from dggeme703-chm.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4FSjwv70Z6z5vCT; Sun, 25 Apr 2021 17:51:59 +0800 (CST) Received: from huawei.com (10.175.104.170) by dggeme703-chm.china.huawei.com (10.1.199.99) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Sun, 25 Apr 2021 17:54:24 +0800 From: Miaohe Lin To: CC: , , , , , , , , , , , , , , , Subject: [PATCH v5 1/4] mm/swapfile: use percpu_ref to serialize against concurrent swapoff Date: Sun, 25 Apr 2021 17:54:16 +0800 Message-ID: <20210425095419.3830298-2-linmiaohe@huawei.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20210425095419.3830298-1-linmiaohe@huawei.com> References: <20210425095419.3830298-1-linmiaohe@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.104.170] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggeme703-chm.china.huawei.com (10.1.199.99) X-CFilter-Loop: Reflected X-Stat-Signature: 7smpqhrxzruok88jfexryikicgoir53p X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E9368600010F Received-SPF: none (huawei.com>: No applicable sender policy available) receiver=imf25; identity=mailfrom; envelope-from=""; helo=szxga02-in.huawei.com; client-ip=45.249.212.188 X-HE-DKIM-Result: none/none X-HE-Tag: 1619344465-49723 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Using current get/put_swap_device() to guard against concurrent swapoff for some swap ops, e.g. swap_readpage(), looks terrible because they might take really long time. This patch adds the percpu_ref support to serialize against concurrent swapoff(as suggested by Huang, Ying). Also we remove the SWP_VALID flag because it's used together with RCU solution. Reviewed-by: "Huang, Ying" Signed-off-by: Miaohe Lin --- include/linux/swap.h | 5 +-- mm/swapfile.c | 79 +++++++++++++++++++++++++++----------------- 2 files changed, 52 insertions(+), 32 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 144727041e78..c9e7fea10b83 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -177,7 +177,6 @@ enum { SWP_PAGE_DISCARD = (1 << 10), /* freed swap page-cluster discards */ SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */ SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */ - SWP_VALID = (1 << 13), /* swap is valid to be operated on? */ /* add others here before... */ SWP_SCANNING = (1 << 14), /* refcount in scan_swap_map */ }; @@ -240,6 +239,7 @@ struct swap_cluster_list { * The in-memory structure used to track swap areas. */ struct swap_info_struct { + struct percpu_ref users; /* indicate and keep swap device valid. */ unsigned long flags; /* SWP_USED etc: see above */ signed short prio; /* swap priority of this type */ struct plist_node list; /* entry in swap_active_head */ @@ -260,6 +260,7 @@ struct swap_info_struct { struct block_device *bdev; /* swap device or bdev of swap file */ struct file *swap_file; /* seldom referenced */ unsigned int old_block_size; /* seldom referenced */ + struct completion comp; /* seldom referenced */ #ifdef CONFIG_FRONTSWAP unsigned long *frontswap_map; /* frontswap in-use, one bit per page */ atomic_t frontswap_pages; /* frontswap pages in-use counter */ @@ -511,7 +512,7 @@ sector_t swap_page_sector(struct page *page); static inline void put_swap_device(struct swap_info_struct *si) { - rcu_read_unlock(); + percpu_ref_put(&si->users); } #else /* CONFIG_SWAP */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 149e77454e3c..2aad85751991 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -511,6 +512,14 @@ static void swap_discard_work(struct work_struct *work) spin_unlock(&si->lock); } +static void swap_users_ref_free(struct percpu_ref *ref) +{ + struct swap_info_struct *si; + + si = container_of(ref, struct swap_info_struct, users); + complete(&si->comp); +} + static void alloc_cluster(struct swap_info_struct *si, unsigned long idx) { struct swap_cluster_info *ci = si->cluster_info; @@ -1270,18 +1279,12 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *p, * via preventing the swap device from being swapoff, until * put_swap_device() is called. Otherwise return NULL. * - * The entirety of the RCU read critical section must come before the - * return from or after the call to synchronize_rcu() in - * enable_swap_info() or swapoff(). So if "si->flags & SWP_VALID" is - * true, the si->map, si->cluster_info, etc. must be valid in the - * critical section. - * * Notice that swapoff or swapoff+swapon can still happen before the - * rcu_read_lock() in get_swap_device() or after the rcu_read_unlock() - * in put_swap_device() if there isn't any other way to prevent - * swapoff, such as page lock, page table lock, etc. The caller must - * be prepared for that. For example, the following situation is - * possible. + * percpu_ref_tryget_live() in get_swap_device() or after the + * percpu_ref_put() in put_swap_device() if there isn't any other way + * to prevent swapoff, such as page lock, page table lock, etc. The + * caller must be prepared for that. For example, the following + * situation is possible. * * CPU1 CPU2 * do_swap_page() @@ -1309,21 +1312,27 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) si = swp_swap_info(entry); if (!si) goto bad_nofile; - - rcu_read_lock(); - if (data_race(!(si->flags & SWP_VALID))) - goto unlock_out; + if (!percpu_ref_tryget_live(&si->users)) + goto out; + /* + * Guarantee the si->users are checked before accessing other + * fields of swap_info_struct. + * + * Paired with the spin_unlock() after setup_swap_info() in + * enable_swap_info(). + */ + smp_rmb(); offset = swp_offset(entry); if (offset >= si->max) - goto unlock_out; + goto put_out; return si; bad_nofile: pr_err("%s: %s%08lx\n", __func__, Bad_file, entry.val); out: return NULL; -unlock_out: - rcu_read_unlock(); +put_out: + percpu_ref_put(&si->users); return NULL; } @@ -2466,7 +2475,7 @@ static void setup_swap_info(struct swap_info_struct *p, int prio, static void _enable_swap_info(struct swap_info_struct *p) { - p->flags |= SWP_WRITEOK | SWP_VALID; + p->flags |= SWP_WRITEOK; atomic_long_add(p->pages, &nr_swap_pages); total_swap_pages += p->pages; @@ -2497,10 +2506,9 @@ static void enable_swap_info(struct swap_info_struct *p, int prio, spin_unlock(&p->lock); spin_unlock(&swap_lock); /* - * Guarantee swap_map, cluster_info, etc. fields are valid - * between get/put_swap_device() if SWP_VALID bit is set + * Finished initializing swap device, now it's safe to reference it. */ - synchronize_rcu(); + percpu_ref_resurrect(&p->users); spin_lock(&swap_lock); spin_lock(&p->lock); _enable_swap_info(p); @@ -2616,16 +2624,16 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) reenable_swap_slots_cache_unlock(); - spin_lock(&swap_lock); - spin_lock(&p->lock); - p->flags &= ~SWP_VALID; /* mark swap device as invalid */ - spin_unlock(&p->lock); - spin_unlock(&swap_lock); /* - * wait for swap operations protected by get/put_swap_device() - * to complete + * Wait for swap operations protected by get/put_swap_device() + * to complete. + * + * We need synchronize_rcu() here to protect the accessing to + * the swap cache data structure. */ + percpu_ref_kill(&p->users); synchronize_rcu(); + wait_for_completion(&p->comp); flush_work(&p->discard_work); @@ -2857,6 +2865,12 @@ static struct swap_info_struct *alloc_swap_info(void) if (!p) return ERR_PTR(-ENOMEM); + if (percpu_ref_init(&p->users, swap_users_ref_free, + PERCPU_REF_INIT_DEAD, GFP_KERNEL)) { + kvfree(p); + return ERR_PTR(-ENOMEM); + } + spin_lock(&swap_lock); for (type = 0; type < nr_swapfiles; type++) { if (!(swap_info[type]->flags & SWP_USED)) @@ -2864,6 +2878,7 @@ static struct swap_info_struct *alloc_swap_info(void) } if (type >= MAX_SWAPFILES) { spin_unlock(&swap_lock); + percpu_ref_exit(&p->users); kvfree(p); return ERR_PTR(-EPERM); } @@ -2891,9 +2906,13 @@ static struct swap_info_struct *alloc_swap_info(void) plist_node_init(&p->avail_lists[i], 0); p->flags = SWP_USED; spin_unlock(&swap_lock); - kvfree(defer); + if (defer) { + percpu_ref_exit(&defer->users); + kvfree(defer); + } spin_lock_init(&p->lock); spin_lock_init(&p->cont_lock); + init_completion(&p->comp); return p; } From patchwork Sun Apr 25 09:54:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miaohe Lin X-Patchwork-Id: 12223193 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A280C433ED for ; Sun, 25 Apr 2021 09:54:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0A3B86128B for ; Sun, 25 Apr 2021 09:54:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0A3B86128B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B80276B006C; Sun, 25 Apr 2021 05:54:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F2756B0071; Sun, 25 Apr 2021 05:54:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D0346B0036; Sun, 25 Apr 2021 05:54:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0037.hostedemail.com [216.40.44.37]) by kanga.kvack.org (Postfix) with ESMTP id 4A1D66B0070 for ; Sun, 25 Apr 2021 05:54:31 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 026536D74 for ; Sun, 25 Apr 2021 09:54:31 +0000 (UTC) X-FDA: 78070429542.32.F141342 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf14.hostedemail.com (Postfix) with ESMTP id 85899C0007C9 for ; Sun, 25 Apr 2021 09:54:16 +0000 (UTC) Received: from dggeme703-chm.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4FSjx754t7zRfyH; Sun, 25 Apr 2021 17:52:11 +0800 (CST) Received: from huawei.com (10.175.104.170) by dggeme703-chm.china.huawei.com (10.1.199.99) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Sun, 25 Apr 2021 17:54:25 +0800 From: Miaohe Lin To: CC: , , , , , , , , , , , , , , , Subject: [PATCH v5 2/4] swap: fix do_swap_page() race with swapoff Date: Sun, 25 Apr 2021 17:54:17 +0800 Message-ID: <20210425095419.3830298-3-linmiaohe@huawei.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20210425095419.3830298-1-linmiaohe@huawei.com> References: <20210425095419.3830298-1-linmiaohe@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.104.170] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggeme703-chm.china.huawei.com (10.1.199.99) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 85899C0007C9 X-Stat-Signature: p4jsm39s36cygmm3owkw9jy89jua5cr7 X-Rspamd-Server: rspam02 Received-SPF: none (huawei.com>: No applicable sender policy available) receiver=imf14; identity=mailfrom; envelope-from=""; helo=szxga02-in.huawei.com; client-ip=45.249.212.188 X-HE-DKIM-Result: none/none X-HE-Tag: 1619344456-642753 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When I was investigating the swap code, I found the below possible race window: CPU 1 CPU 2 ----- ----- do_swap_page if (data_race(si->flags & SWP_SYNCHRONOUS_IO) swap_readpage if (data_race(sis->flags & SWP_FS_OPS)) { swapoff .. p->swap_file = NULL; .. struct file *swap_file = sis->swap_file; struct address_space *mapping = swap_file->f_mapping;[oops!] Note that for the pages that are swapped in through swap cache, this isn't an issue. Because the page is locked, and the swap entry will be marked with SWAP_HAS_CACHE, so swapoff() can not proceed until the page has been unlocked. Fix this race by using get/put_swap_device() to guard against concurrent swapoff. Fixes: 0bcac06f27d7 ("mm,swap: skip swapcache for swapin of synchronous device") Reported-by: kernel test robot (auto build test ERROR) Reviewed-by: "Huang, Ying" Signed-off-by: Miaohe Lin --- include/linux/swap.h | 9 +++++++++ mm/memory.c | 11 +++++++++-- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index c9e7fea10b83..46d51d058d05 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -527,6 +527,15 @@ static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry) return NULL; } +static inline struct swap_info_struct *get_swap_device(swp_entry_t entry) +{ + return NULL; +} + +static inline void put_swap_device(struct swap_info_struct *si) +{ +} + #define swap_address_space(entry) (NULL) #define get_nr_swap_pages() 0L #define total_swap_pages 0L diff --git a/mm/memory.c b/mm/memory.c index 27014c3bde9f..39c910678387 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3311,6 +3311,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct page *page = NULL, *swapcache; + struct swap_info_struct *si = NULL; swp_entry_t entry; pte_t pte; int locked; @@ -3338,14 +3339,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out; } + /* Prevent swapoff from happening to us. */ + si = get_swap_device(entry); + if (unlikely(!si)) + goto out; delayacct_set_flag(current, DELAYACCT_PF_SWAPIN); page = lookup_swap_cache(entry, vma, vmf->address); swapcache = page; if (!page) { - struct swap_info_struct *si = swp_swap_info(entry); - if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { /* skip swapcache */ @@ -3514,6 +3517,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); out: + if (si) + put_swap_device(si); return ret; out_nomap: pte_unmap_unlock(vmf->pte, vmf->ptl); @@ -3525,6 +3530,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) unlock_page(swapcache); put_page(swapcache); } + if (si) + put_swap_device(si); return ret; } From patchwork Sun Apr 25 09:54:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miaohe Lin X-Patchwork-Id: 12223191 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02C4CC433ED for ; Sun, 25 Apr 2021 09:54:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8B30E611C9 for ; Sun, 25 Apr 2021 09:54:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8B30E611C9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 96E4D6B0036; Sun, 25 Apr 2021 05:54:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 855C06B006C; Sun, 25 Apr 2021 05:54:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E59C6B0072; Sun, 25 Apr 2021 05:54:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0125.hostedemail.com [216.40.44.125]) by kanga.kvack.org (Postfix) with ESMTP id 2A93B6B0036 for ; Sun, 25 Apr 2021 05:54:31 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CE0096D65 for ; Sun, 25 Apr 2021 09:54:30 +0000 (UTC) X-FDA: 78070429500.21.6619341 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf28.hostedemail.com (Postfix) with ESMTP id 135992000241 for ; Sun, 25 Apr 2021 09:54:31 +0000 (UTC) Received: from dggeme703-chm.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4FSjwx36CDz5v8p; Sun, 25 Apr 2021 17:52:01 +0800 (CST) Received: from huawei.com (10.175.104.170) by dggeme703-chm.china.huawei.com (10.1.199.99) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Sun, 25 Apr 2021 17:54:26 +0800 From: Miaohe Lin To: CC: , , , , , , , , , , , , , , , Subject: [PATCH v5 3/4] mm/swap: remove confusing checking for non_swap_entry() in swap_ra_info() Date: Sun, 25 Apr 2021 17:54:18 +0800 Message-ID: <20210425095419.3830298-4-linmiaohe@huawei.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20210425095419.3830298-1-linmiaohe@huawei.com> References: <20210425095419.3830298-1-linmiaohe@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.104.170] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggeme703-chm.china.huawei.com (10.1.199.99) X-CFilter-Loop: Reflected X-Stat-Signature: pji466n4o37553x51phyzp1bb7z1sogo X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 135992000241 Received-SPF: none (huawei.com>: No applicable sender policy available) receiver=imf28; identity=mailfrom; envelope-from=""; helo=szxga02-in.huawei.com; client-ip=45.249.212.188 X-HE-DKIM-Result: none/none X-HE-Tag: 1619344471-249923 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The non_swap_entry() was used for working with VMA based swap readahead via commit ec560175c0b6 ("mm, swap: VMA based swap readahead"). At that time, the non_swap_entry() checking is necessary because the function is called before checking that in do_swap_page(). Then it's moved to swap_ra_info() since commit eaf649ebc3ac ("mm: swap: clean up swap readahead"). After that, the non_swap_entry() checking is unnecessary, because swap_ra_info() is called after non_swap_entry() has been checked already. The resulting code is confusing as the non_swap_entry() check looks racy now because while we released the pte lock, somebody else might have faulted in this pte. So we should check whether it's swap pte first to guard against such race or swap_type will be unexpected. But the race isn't important because it will not cause problem. We would have enough checking when we really operate the PTE entries later. So we remove the non_swap_entry() check here to avoid confusion. Reviewed-by: "Huang, Ying" Signed-off-by: Miaohe Lin --- mm/swap_state.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 272ea2108c9d..df5405384520 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -721,7 +721,6 @@ static void swap_ra_info(struct vm_fault *vmf, { struct vm_area_struct *vma = vmf->vma; unsigned long ra_val; - swp_entry_t entry; unsigned long faddr, pfn, fpfn; unsigned long start, end; pte_t *pte, *orig_pte; @@ -739,11 +738,6 @@ static void swap_ra_info(struct vm_fault *vmf, faddr = vmf->address; orig_pte = pte = pte_offset_map(vmf->pmd, faddr); - entry = pte_to_swp_entry(*pte); - if ((unlikely(non_swap_entry(entry)))) { - pte_unmap(orig_pte); - return; - } fpfn = PFN_DOWN(faddr); ra_val = GET_SWAP_RA_VAL(vma); From patchwork Sun Apr 25 09:54:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miaohe Lin X-Patchwork-Id: 12223197 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBAFCC433B4 for ; Sun, 25 Apr 2021 09:54:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6BA2C611C9 for ; Sun, 25 Apr 2021 09:54:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6BA2C611C9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3C6246B0071; Sun, 25 Apr 2021 05:54:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39C936B0072; Sun, 25 Apr 2021 05:54:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1CF566B0073; Sun, 25 Apr 2021 05:54:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0183.hostedemail.com [216.40.44.183]) by kanga.kvack.org (Postfix) with ESMTP id E8F906B0071 for ; Sun, 25 Apr 2021 05:54:31 -0400 (EDT) Received: from smtpin40.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A2946180AC46F for ; Sun, 25 Apr 2021 09:54:31 +0000 (UTC) X-FDA: 78070429542.40.6F787AD Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf19.hostedemail.com (Postfix) with ESMTP id C499D90009EF for ; Sun, 25 Apr 2021 09:54:03 +0000 (UTC) Received: from dggeme703-chm.china.huawei.com (unknown [172.30.72.53]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4FSjvH0NYbzWKHR; Sun, 25 Apr 2021 17:50:35 +0800 (CST) Received: from huawei.com (10.175.104.170) by dggeme703-chm.china.huawei.com (10.1.199.99) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Sun, 25 Apr 2021 17:54:27 +0800 From: Miaohe Lin To: CC: , , , , , , , , , , , , , , , Subject: [PATCH v5 4/4] mm/shmem: fix shmem_swapin() race with swapoff Date: Sun, 25 Apr 2021 17:54:19 +0800 Message-ID: <20210425095419.3830298-5-linmiaohe@huawei.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20210425095419.3830298-1-linmiaohe@huawei.com> References: <20210425095419.3830298-1-linmiaohe@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.104.170] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggeme703-chm.china.huawei.com (10.1.199.99) X-CFilter-Loop: Reflected X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C499D90009EF X-Stat-Signature: e71m3dk5jhcfr96ttdbk13hpgru5wug4 Received-SPF: none (huawei.com>: No applicable sender policy available) receiver=imf19; identity=mailfrom; envelope-from=""; helo=szxga01-in.huawei.com; client-ip=45.249.212.187 X-HE-DKIM-Result: none/none X-HE-Tag: 1619344443-714185 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When I was investigating the swap code, I found the below possible race window: CPU 1 CPU 2 ----- ----- shmem_swapin swap_cluster_readahead if (likely(si->flags & (SWP_BLKDEV | SWP_FS_OPS))) { swapoff .. si->swap_file = NULL; .. struct inode *inode = si->swap_file->f_mapping->host;[oops!] Close this race window by using get/put_swap_device() to guard against concurrent swapoff. Fixes: 8fd2e0b505d1 ("mm: swap: check if swap backing device is congested or not") Signed-off-by: Miaohe Lin Reviewed-by: "Huang, Ying" --- mm/shmem.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index 26c76b13ad23..2dafd65b0b42 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1696,6 +1696,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index, struct address_space *mapping = inode->i_mapping; struct shmem_inode_info *info = SHMEM_I(inode); struct mm_struct *charge_mm = vma ? vma->vm_mm : current->mm; + struct swap_info_struct *si; struct page *page; swp_entry_t swap; int error; @@ -1704,6 +1705,12 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index, swap = radix_to_swp_entry(*pagep); *pagep = NULL; + /* Prevent swapoff from happening to us. */ + si = get_swap_device(swap); + if (!si) { + error = EINVAL; + goto failed; + } /* Look it up and read it in.. */ page = lookup_swap_cache(swap, NULL, 0); if (!page) { @@ -1765,6 +1772,8 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index, swap_free(swap); *pagep = page; + if (si) + put_swap_device(si); return 0; failed: if (!shmem_confirm_swap(mapping, index, swap)) @@ -1775,6 +1784,9 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index, put_page(page); } + if (si) + put_swap_device(si); + return error; }