From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736818 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87B37C433EF for ; Mon, 7 Feb 2022 04:46:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B81B6B0073; Sun, 6 Feb 2022 23:46:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 067286B0074; Sun, 6 Feb 2022 23:46:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4A506B0075; Sun, 6 Feb 2022 23:46:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0001.hostedemail.com [216.40.44.1]) by kanga.kvack.org (Postfix) with ESMTP id D777C6B0073 for ; Sun, 6 Feb 2022 23:46:55 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9F5E688489 for ; Mon, 7 Feb 2022 04:46:55 +0000 (UTC) X-FDA: 79114748790.24.19E686D Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf23.hostedemail.com (Postfix) with ESMTP id 0D843140005 for ; Mon, 7 Feb 2022 04:46:54 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A0E02210EA; Mon, 7 Feb 2022 04:46:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209213; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zqNHdmgKRj+6NKQTNucmcQ/3yI6GnNFeF8QV0taIvqU=; b=UdkWSisb5z2djtr4WtffIH/FC/walG388xtY8XWMza5YEHEbz18XRACx0rJbgvE5UbKuEI dnbpyq5HHKeSLVcumrXrrCDrvPfDBIZURROExY/MO5qw3GMOcjeqle5LcoHLHZpjorQh5v pmqoRmJyt2OtF9VRU2GB5ak1QVtxtTQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209213; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zqNHdmgKRj+6NKQTNucmcQ/3yI6GnNFeF8QV0taIvqU=; b=Xn6zgT+aoThSWUi6NGaL2qiCFheonaj0IctNnotKuuI1lcJVmdsoWkOwZJbJ/1QbDN+0aL ex5+O8T4lbMtKvDw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 5DF131330E; Mon, 7 Feb 2022 04:46:50 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id dqvkBjqkAGILNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:46:50 +0000 Subject: [PATCH 01/21] MM: create new mm/swap.h header file. From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916109.29374.8959231877111146366.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=UdkWSisb; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=Xn6zgT+a; spf=pass (imf23.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspam-User: nil X-Rspamd-Queue-Id: 0D843140005 X-Stat-Signature: 4zkmesw3dyic57xo1owh5y8cztxag9ck X-Rspamd-Server: rspam12 X-HE-Tag: 1644209214-939967 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Many functions declared in include/linux/swap.h are only used within mm/ Create a new "mm/swap.h" and move some of these declarations there. Remove the redundant 'extern' from the function declarations. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- include/linux/swap.h | 121 ----------------------------------------------- mm/madvise.c | 1 mm/memcontrol.c | 1 mm/memory.c | 1 mm/mincore.c | 1 mm/page_alloc.c | 1 mm/page_io.c | 1 mm/shmem.c | 1 mm/swap.h | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++ mm/swap_state.c | 1 mm/swapfile.c | 1 mm/util.c | 1 mm/vmscan.c | 1 13 files changed, 140 insertions(+), 121 deletions(-) create mode 100644 mm/swap.h diff --git a/include/linux/swap.h b/include/linux/swap.h index 1d38d9475c4d..3f54a8941c9d 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -419,62 +419,19 @@ extern void kswapd_stop(int nid); #ifdef CONFIG_SWAP -#include /* for bio_end_io_t */ - -/* linux/mm/page_io.c */ -extern int swap_readpage(struct page *page, bool do_poll); -extern int swap_writepage(struct page *page, struct writeback_control *wbc); -extern void end_swap_bio_write(struct bio *bio); -extern int __swap_writepage(struct page *page, struct writeback_control *wbc, - bio_end_io_t end_write_func); extern int swap_set_page_dirty(struct page *page); - int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); int generic_swapfile_activate(struct swap_info_struct *, struct file *, sector_t *); -/* linux/mm/swap_state.c */ -/* One swap address space for each 64M swap space */ -#define SWAP_ADDRESS_SPACE_SHIFT 14 -#define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT) -extern struct address_space *swapper_spaces[]; -#define swap_address_space(entry) \ - (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ - >> SWAP_ADDRESS_SPACE_SHIFT]) static inline unsigned long total_swapcache_pages(void) { return global_node_page_state(NR_SWAPCACHE); } -extern void show_swap_cache_info(void); -extern int add_to_swap(struct page *page); -extern void *get_shadow_from_swap_cache(swp_entry_t entry); -extern int add_to_swap_cache(struct page *page, swp_entry_t entry, - gfp_t gfp, void **shadowp); -extern void __delete_from_swap_cache(struct page *page, - swp_entry_t entry, void *shadow); -extern void delete_from_swap_cache(struct page *); -extern void clear_shadow_from_swap_cache(int type, unsigned long begin, - unsigned long end); -extern void free_swap_cache(struct page *); extern void free_page_and_swap_cache(struct page *); extern void free_pages_and_swap_cache(struct page **, int); -extern struct page *lookup_swap_cache(swp_entry_t entry, - struct vm_area_struct *vma, - unsigned long addr); -struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); -extern struct page *read_swap_cache_async(swp_entry_t, gfp_t, - struct vm_area_struct *vma, unsigned long addr, - bool do_poll); -extern struct page *__read_swap_cache_async(swp_entry_t, gfp_t, - struct vm_area_struct *vma, unsigned long addr, - bool *new_page_allocated); -extern struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); -extern struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); - /* linux/mm/swapfile.c */ extern atomic_long_t nr_swap_pages; extern long total_swap_pages; @@ -528,12 +485,6 @@ static inline void put_swap_device(struct swap_info_struct *si) } #else /* CONFIG_SWAP */ - -static inline int swap_readpage(struct page *page, bool do_poll) -{ - return 0; -} - static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry) { return NULL; @@ -548,11 +499,6 @@ static inline void put_swap_device(struct swap_info_struct *si) { } -static inline struct address_space *swap_address_space(swp_entry_t entry) -{ - return NULL; -} - #define get_nr_swap_pages() 0L #define total_swap_pages 0L #define total_swapcache_pages() 0UL @@ -567,14 +513,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry) #define free_pages_and_swap_cache(pages, nr) \ release_pages((pages), (nr)); -static inline void free_swap_cache(struct page *page) -{ -} - -static inline void show_swap_cache_info(void) -{ -} - /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */ #define free_swap_and_cache(e) is_pfn_swap_entry(e) @@ -600,65 +538,6 @@ static inline void put_swap_page(struct page *page, swp_entry_t swp) { } -static inline struct page *swap_cluster_readahead(swp_entry_t entry, - gfp_t gfp_mask, struct vm_fault *vmf) -{ - return NULL; -} - -static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask, - struct vm_fault *vmf) -{ - return NULL; -} - -static inline int swap_writepage(struct page *p, struct writeback_control *wbc) -{ - return 0; -} - -static inline struct page *lookup_swap_cache(swp_entry_t swp, - struct vm_area_struct *vma, - unsigned long addr) -{ - return NULL; -} - -static inline -struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index) -{ - return find_get_page(mapping, index); -} - -static inline int add_to_swap(struct page *page) -{ - return 0; -} - -static inline void *get_shadow_from_swap_cache(swp_entry_t entry) -{ - return NULL; -} - -static inline int add_to_swap_cache(struct page *page, swp_entry_t entry, - gfp_t gfp_mask, void **shadowp) -{ - return -1; -} - -static inline void __delete_from_swap_cache(struct page *page, - swp_entry_t entry, void *shadow) -{ -} - -static inline void delete_from_swap_cache(struct page *page) -{ -} - -static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, - unsigned long end) -{ -} static inline int page_swapcount(struct page *page) { diff --git a/mm/madvise.c b/mm/madvise.c index 5604064df464..1ee4b7583379 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -35,6 +35,7 @@ #include #include "internal.h" +#include "swap.h" struct madvise_walk_private { struct mmu_gather *tlb; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 09d342c7cbd0..9b7c8181a207 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -66,6 +66,7 @@ #include #include #include "slab.h" +#include "swap.h" #include diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..d25372340107 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -86,6 +86,7 @@ #include "pgalloc-track.h" #include "internal.h" +#include "swap.h" #if defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) && !defined(CONFIG_COMPILE_TEST) #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid. diff --git a/mm/mincore.c b/mm/mincore.c index 9122676b54d6..f4f627325e12 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -20,6 +20,7 @@ #include #include +#include "swap.h" static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long end, struct mm_walk *walk) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3589febc6d31..221aa3c10b78 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -81,6 +81,7 @@ #include "internal.h" #include "shuffle.h" #include "page_reporting.h" +#include "swap.h" /* Free Page Internal flags: for internal, non-pcp variants of free_pages(). */ typedef int __bitwise fpi_t; diff --git a/mm/page_io.c b/mm/page_io.c index 0bf8e40f4e57..f8c26092e869 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -26,6 +26,7 @@ #include #include #include +#include "swap.h" void end_swap_bio_write(struct bio *bio) { diff --git a/mm/shmem.c b/mm/shmem.c index a09b29ec2b45..c8b8819fe2e6 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -38,6 +38,7 @@ #include #include #include +#include "swap.h" static struct vfsmount *shm_mnt; diff --git a/mm/swap.h b/mm/swap.h new file mode 100644 index 000000000000..13e72a5023aa --- /dev/null +++ b/mm/swap.h @@ -0,0 +1,129 @@ + +#ifdef CONFIG_SWAP +#include /* for bio_end_io_t */ + +/* linux/mm/page_io.c */ +int swap_readpage(struct page *page, bool do_poll); +int swap_writepage(struct page *page, struct writeback_control *wbc); +void end_swap_bio_write(struct bio *bio); +int __swap_writepage(struct page *page, struct writeback_control *wbc, + bio_end_io_t end_write_func); + +/* linux/mm/swap_state.c */ +/* One swap address space for each 64M swap space */ +#define SWAP_ADDRESS_SPACE_SHIFT 14 +#define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT) +extern struct address_space *swapper_spaces[]; +#define swap_address_space(entry) \ + (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ + >> SWAP_ADDRESS_SPACE_SHIFT]) + +void show_swap_cache_info(void); +int add_to_swap(struct page *page); +void *get_shadow_from_swap_cache(swp_entry_t entry); +int add_to_swap_cache(struct page *page, swp_entry_t entry, + gfp_t gfp, void **shadowp); +void __delete_from_swap_cache(struct page *page, + swp_entry_t entry, void *shadow); +void delete_from_swap_cache(struct page *); +void clear_shadow_from_swap_cache(int type, unsigned long begin, + unsigned long end); +void free_swap_cache(struct page *); +struct page *lookup_swap_cache(swp_entry_t entry, + struct vm_area_struct *vma, + unsigned long addr); +struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); + +struct page *read_swap_cache_async(swp_entry_t, gfp_t, + struct vm_area_struct *vma, + unsigned long addr, + bool do_poll); +struct page *__read_swap_cache_async(swp_entry_t, gfp_t, + struct vm_area_struct *vma, + unsigned long addr, + bool *new_page_allocated); +struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf); +struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf); + +#else /* CONFIG_SWAP */ +static inline int swap_readpage(struct page *page, bool do_poll) +{ + return 0; +} + +static inline struct address_space *swap_address_space(swp_entry_t entry) +{ + return NULL; +} + +static inline void free_swap_cache(struct page *page) +{ +} + +static inline void show_swap_cache_info(void) +{ +} + +static inline struct page *swap_cluster_readahead(swp_entry_t entry, + gfp_t gfp_mask, struct vm_fault *vmf) +{ + return NULL; +} + +static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask, + struct vm_fault *vmf) +{ + return NULL; +} + +static inline int swap_writepage(struct page *p, struct writeback_control *wbc) +{ + return 0; +} + +static inline struct page *lookup_swap_cache(swp_entry_t swp, + struct vm_area_struct *vma, + unsigned long addr) +{ + return NULL; +} + +static inline +struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index) +{ + return find_get_page(mapping, index); +} + +static inline int add_to_swap(struct page *page) +{ + return 0; +} + +static inline void *get_shadow_from_swap_cache(swp_entry_t entry) +{ + return NULL; +} + +static inline int add_to_swap_cache(struct page *page, swp_entry_t entry, + gfp_t gfp_mask, void **shadowp) +{ + return -1; +} + +static inline void __delete_from_swap_cache(struct page *page, + swp_entry_t entry, void *shadow) +{ +} + +static inline void delete_from_swap_cache(struct page *page) +{ +} + +static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, + unsigned long end) +{ +} + +#endif /* CONFIG_SWAP */ diff --git a/mm/swap_state.c b/mm/swap_state.c index 8d4104242100..bb38453425c7 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -23,6 +23,7 @@ #include #include #include "internal.h" +#include "swap.h" /* * swapper_space is a fiction, retained to simplify the path through diff --git a/mm/swapfile.c b/mm/swapfile.c index bf0df7aa7158..71c7a31dd291 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -44,6 +44,7 @@ #include #include #include +#include "swap.h" static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); diff --git a/mm/util.c b/mm/util.c index 7e43369064c8..619697e3d935 100644 --- a/mm/util.c +++ b/mm/util.c @@ -27,6 +27,7 @@ #include #include "internal.h" +#include "swap.h" /** * kfree_const - conditionally free memory diff --git a/mm/vmscan.c b/mm/vmscan.c index 090bfb605ecf..5c734ffc6057 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -58,6 +58,7 @@ #include #include "internal.h" +#include "swap.h" #define CREATE_TRACE_POINTS #include From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736819 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9DBFC433EF for ; Mon, 7 Feb 2022 04:47:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 39CFD6B0074; Sun, 6 Feb 2022 23:47:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 34CE96B0075; Sun, 6 Feb 2022 23:47:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EDA36B0078; Sun, 6 Feb 2022 23:47:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 115C16B0074 for ; Sun, 6 Feb 2022 23:47:03 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id BA1E4181E4910 for ; Mon, 7 Feb 2022 04:47:02 +0000 (UTC) X-FDA: 79114749084.14.97F1DB4 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf22.hostedemail.com (Postfix) with ESMTP id 3EB35C0003 for ; Mon, 7 Feb 2022 04:47:02 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 39DA0210EA; Mon, 7 Feb 2022 04:47:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209221; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3r8zsBbPzlbUX6ihPzKY8ExxG48IN+l0THpUms1TqYM=; b=UtZfd3blxXJDljhegPqUt1T8a0XNn86n8kcT10KrYnlGH5mPQ9EbVWk9J+FFYsJoeL3hGl UD17NrMjxiyR2BQJTe+Iz7tXl4JNwmaVyVcPmskjhGF/6Siwd6+Bq9eMbymlMRm1hJDgzC YeVz/60dCfGVtNKQud3BFbvUY8iIDlY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209221; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3r8zsBbPzlbUX6ihPzKY8ExxG48IN+l0THpUms1TqYM=; b=NUMtiBtL6cZig7Sxd4ypS7Jn0vCqv1VQV9qag2wwmvgvOS+6PR4aq/Pz9mKtlWCmqrcSOo Axz4qktLGCHSLrDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id AAA1F1330E; Mon, 7 Feb 2022 04:46:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 0ALXGUGkAGIRNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:46:57 +0000 Subject: [PATCH 02/21] MM: drop swap_set_page_dirty From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916112.29374.16921074216162707434.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspam-User: nil X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 3EB35C0003 X-Stat-Signature: 5ff8n9zhkd7d64tezr8gzyjmq8ftpamo Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=UtZfd3bl; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=NUMtiBtL; spf=pass (imf22.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1644209222-389969 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pages that are written to swap are owned by the MM subsystem - not any filesystem. When such a page is passed to a filesystem to be written out to a swap-file, the filesystem handles the data, but the page itself does not belong to the filesystem. So calling the filesystem's set_page_dirty address_space operation makes no sense. This is for pages in the given address space, and a page to be written to swap does not exist in the given address space. So drop swap_set_page_dirty() which calls the address-space's set_page_dirty, and alway use __set_page_dirty_no_writeback, which is appropriate for pages being swapped out. Fixes-no-auto-backport: 62c230bc1790 ("mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages") Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- include/linux/swap.h | 1 - mm/page_io.c | 14 -------------- mm/swap_state.c | 2 +- 3 files changed, 1 insertion(+), 16 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 3f54a8941c9d..a43929f7033e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -419,7 +419,6 @@ extern void kswapd_stop(int nid); #ifdef CONFIG_SWAP -extern int swap_set_page_dirty(struct page *page); int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); int generic_swapfile_activate(struct swap_info_struct *, struct file *, diff --git a/mm/page_io.c b/mm/page_io.c index f8c26092e869..34b12d6f94d7 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -438,17 +438,3 @@ int swap_readpage(struct page *page, bool synchronous) delayacct_swapin_end(); return ret; } - -int swap_set_page_dirty(struct page *page) -{ - struct swap_info_struct *sis = page_swap_info(page); - - if (data_race(sis->flags & SWP_FS_OPS)) { - struct address_space *mapping = sis->swap_file->f_mapping; - - VM_BUG_ON_PAGE(!PageSwapCache(page), page); - return mapping->a_ops->set_page_dirty(page); - } else { - return __set_page_dirty_no_writeback(page); - } -} diff --git a/mm/swap_state.c b/mm/swap_state.c index bb38453425c7..514b86b05488 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -31,7 +31,7 @@ */ static const struct address_space_operations swap_aops = { .writepage = swap_writepage, - .set_page_dirty = swap_set_page_dirty, + .set_page_dirty = __set_page_dirty_no_writeback, #ifdef CONFIG_MIGRATION .migratepage = migrate_page, #endif From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736820 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AF75C433F5 for ; Mon, 7 Feb 2022 04:47:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC9716B0075; Sun, 6 Feb 2022 23:47:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A788F6B0078; Sun, 6 Feb 2022 23:47:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 919E56B007B; Sun, 6 Feb 2022 23:47:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0028.hostedemail.com [216.40.44.28]) by kanga.kvack.org (Postfix) with ESMTP id 82FAB6B0075 for ; Sun, 6 Feb 2022 23:47:10 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 45EF7181E2535 for ; Mon, 7 Feb 2022 04:47:10 +0000 (UTC) X-FDA: 79114749420.15.FEC77A9 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf29.hostedemail.com (Postfix) with ESMTP id C4EB4120004 for ; Mon, 7 Feb 2022 04:47:09 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C173D210EA; Mon, 7 Feb 2022 04:47:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209228; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jIwc70WfyZePk+M4TY2KFFjSotnhhDywYz77GPeU54I=; b=ikz+i/JzVV65yxb2LUs6PuJG5e71Z8cf0FCBCTwmQXIBBaWz7WzKAfUY7+5dxqv3EeamNP vpVguyYOVN3GdY74xr7qwgFI/YZ2d0mfWArhal/iwFqps1IacbWE5EGygBB7cXVhUTyGXV TxOxx4jDPhTUzTesTOJxRzKEX5wg/aY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209228; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jIwc70WfyZePk+M4TY2KFFjSotnhhDywYz77GPeU54I=; b=TDjRpTiE1Vmtdb2gZ+xE1034/7dr8Wq8HP7cTr9R85FgeZ86lrzm4X7/YxOcgW/9zP4kQi jZERoqE82s77CkDw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id DF1A91330E; Mon, 7 Feb 2022 04:47:05 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id O7oOJ0mkAGIXNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:47:05 +0000 Subject: [PATCH 03/21] MM: move responsibility for setting SWP_FS_OPS to ->swap_activate From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916113.29374.18179900562061291108.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspam-User: nil X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: C4EB4120004 X-Stat-Signature: s65yfw3c73h35794oqcnxo17cn493a3g Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="ikz+i/Jz"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=TDjRpTiE; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf29.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de X-HE-Tag: 1644209229-883923 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If a filesystem wishes to handle all swap IO itself (via ->direct_IO and ->readpage), rather than just providing devices addresses for submit_bio(), SWP_FS_OPS must be set. Currently the protocol for setting this it to have ->swap_activate return zero. In that case SWP_FS_OPS is set, and add_swap_extent() is called for the entire file. This is a little clumsy as different return values for ->swap_activate have quite different meanings, and it makes it hard to search for which filesystems require SWP_FS_OPS to be set. So remove the special meaning of a zero return, and require the filesystem to set SWP_FS_OPS if it so desires, and to always call add_swap_extent() as required. Currently only NFS and CIFS return zero for add_swap_extent(). Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- fs/cifs/file.c | 3 ++- fs/nfs/file.c | 13 +++++++++++-- include/linux/swap.h | 6 ++++++ mm/swapfile.c | 10 +++------- 4 files changed, 22 insertions(+), 10 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index e7af802dcfa6..fe49f1cab018 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -4917,7 +4917,8 @@ static int cifs_swap_activate(struct swap_info_struct *sis, * from reading or writing the file */ - return 0; + sis->flags |= SWP_FS_OPS; + return add_swap_extent(sis, 0, sis->max, 0); } static void cifs_swap_deactivate(struct file *file) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 76d76acbc594..d5aa55c7edb0 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -488,6 +488,7 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, { unsigned long blocks; long long isize; + int ret; struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); struct inode *inode = file->f_mapping->host; @@ -500,9 +501,17 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, return -EINVAL; } + ret = rpc_clnt_swap_activate(clnt); + if (ret) + return ret; + ret = add_swap_extent(sis, 0, sis->max, 0); + if (ret < 0) { + rpc_clnt_swap_deactivate(clnt); + return ret; + } *span = sis->pages; - - return rpc_clnt_swap_activate(clnt); + sis->flags |= SWP_FS_OPS; + return ret; } static void nfs_swap_deactivate(struct file *file) diff --git a/include/linux/swap.h b/include/linux/swap.h index a43929f7033e..b57cff3c5ac2 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -573,6 +573,12 @@ static inline swp_entry_t get_swap_page(struct page *page) return entry; } +static inline int add_swap_extent(struct swap_info_struct *sis, + unsigned long start_page, + unsigned long nr_pages, sector_t start_block) +{ + return -EINVAL; +} #endif /* CONFIG_SWAP */ #ifdef CONFIG_THP_SWAP diff --git a/mm/swapfile.c b/mm/swapfile.c index 71c7a31dd291..ed6028aea8bf 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2347,13 +2347,9 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span) if (mapping->a_ops->swap_activate) { ret = mapping->a_ops->swap_activate(sis, swap_file, span); - if (ret >= 0) - sis->flags |= SWP_ACTIVATED; - if (!ret) { - sis->flags |= SWP_FS_OPS; - ret = add_swap_extent(sis, 0, sis->max, 0); - *span = sis->pages; - } + if (ret < 0) + return ret; + sis->flags |= SWP_ACTIVATED; return ret; } From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736821 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F901C433F5 for ; Mon, 7 Feb 2022 04:47:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 39D166B0078; Sun, 6 Feb 2022 23:47:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 34C4E6B007B; Sun, 6 Feb 2022 23:47:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 215A56B007D; Sun, 6 Feb 2022 23:47:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0095.hostedemail.com [216.40.44.95]) by kanga.kvack.org (Postfix) with ESMTP id 146886B0078 for ; Sun, 6 Feb 2022 23:47:17 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D385C181E2535 for ; Mon, 7 Feb 2022 04:47:16 +0000 (UTC) X-FDA: 79114749672.30.3458A90 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf28.hostedemail.com (Postfix) with ESMTP id 549C1C0006 for ; Mon, 7 Feb 2022 04:47:16 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 5D4771F37F; Mon, 7 Feb 2022 04:47:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209235; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aJeimCLV8LLTUkUjAiRuHMND8QpvrJbpAXji1B8C5fw=; b=eiXhvuURk6X3oyWXqxv/Z7XIKjVfwAB5btUvlgkYVv6xaYKcPvxlZPihlbJFkFb1vePjB9 oQEf/+ABDHHEGwAzoG11bbuQjSHu9PVN6qlAm+5QIm8FFLJov1f7iKbAHdf1Vl1Wr3z3eO ZeEo9g0fSq0uP6ose9eVRKTW5PXVDR4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209235; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aJeimCLV8LLTUkUjAiRuHMND8QpvrJbpAXji1B8C5fw=; b=SIQFoD9QKd0q3Ew15UnxNqGawIo+F+FJ8e2Bg6kfLHU/87iovD0eaiPVHn8ne9JzvZbe9k whcYQiYZJgiLdYBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 49A9B1330E; Mon, 7 Feb 2022 04:47:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 5j58AVCkAGIcNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:47:12 +0000 Subject: [PATCH 04/21] MM: reclaim mustn't enter FS for SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916113.29374.6507434037983544219.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspam-User: nil X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 549C1C0006 X-Stat-Signature: f1fdbznms87kt4u8z1oidzxcadw4xqi3 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=eiXhvuUR; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=SIQFoD9Q; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf28.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de X-HE-Tag: 1644209236-23217 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If swap-out is using filesystem operations (SWP_FS_OPS), then it is not safe to enter the FS for reclaim. So only down-grade the requirement for swap pages to __GFP_IO after checking that SWP_FS_OPS are not being used. This makes the calculation of "may_enter_fs" slightly more complex, so move it into a separate function. with that done, there is little value in maintaining the bool variable any more. So replace the may_enter_fs variable with a may_enter_fs() function. This removes any risk for the variable becoming out-of-date. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- mm/swap.h | 8 ++++++++ mm/vmscan.c | 29 ++++++++++++++++++++--------- 2 files changed, 28 insertions(+), 9 deletions(-) diff --git a/mm/swap.h b/mm/swap.h index 13e72a5023aa..5c676e55f288 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -47,6 +47,10 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf); +static inline unsigned int page_swap_flags(struct page *page) +{ + return page_swap_info(page)->flags; +} #else /* CONFIG_SWAP */ static inline int swap_readpage(struct page *page, bool do_poll) { @@ -126,4 +130,8 @@ static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, { } +static inline unsigned int page_swap_flags(struct page *page) +{ + return 0; +} #endif /* CONFIG_SWAP */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 5c734ffc6057..ad5026d06aa8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1506,6 +1506,22 @@ static unsigned int demote_page_list(struct list_head *demote_pages, return nr_succeeded; } +static bool may_enter_fs(struct page *page, gfp_t gfp_mask) +{ + if (gfp_mask & __GFP_FS) + return true; + if (!PageSwapCache(page) || !(gfp_mask & __GFP_IO)) + return false; + /* + * We can "enter_fs" for swap-cache with only __GFP_IO + * providing this isn't SWP_FS_OPS. + * ->flags can be updated non-atomicially (scan_swap_map_slots), + * but that will never affect SWP_FS_OPS, so the data_race + * is safe. + */ + return !data_race(page_swap_flags(page) & SWP_FS_OPS); +} + /* * shrink_page_list() returns the number of reclaimed pages */ @@ -1531,7 +1547,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, struct address_space *mapping; struct page *page; enum page_references references = PAGEREF_RECLAIM; - bool dirty, writeback, may_enter_fs; + bool dirty, writeback; unsigned int nr_pages; cond_resched(); @@ -1555,9 +1571,6 @@ static unsigned int shrink_page_list(struct list_head *page_list, if (!sc->may_unmap && page_mapped(page)) goto keep_locked; - may_enter_fs = (sc->gfp_mask & __GFP_FS) || - (PageSwapCache(page) && (sc->gfp_mask & __GFP_IO)); - /* * The number of dirty pages determines if a node is marked * reclaim_congested. kswapd will stall and start writing @@ -1602,7 +1615,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, * not to fs). In this case mark the page for immediate * reclaim and continue scanning. * - * Require may_enter_fs because we would wait on fs, which + * Require may_enter_fs() because we would wait on fs, which * may not have submitted IO yet. And the loop driver might * enter reclaim, and deadlock if it waits on a page for * which it is needed to do the write (loop masks off @@ -1634,7 +1647,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, /* Case 2 above */ } else if (writeback_throttling_sane(sc) || - !PageReclaim(page) || !may_enter_fs) { + !PageReclaim(page) || !may_enter_fs(page, sc->gfp_mask)) { /* * This is slightly racy - end_page_writeback() * might have just cleared PageReclaim, then @@ -1724,8 +1737,6 @@ static unsigned int shrink_page_list(struct list_head *page_list, goto activate_locked_split; } - may_enter_fs = true; - /* Adding to swap updated mapping */ mapping = page_mapping(page); } @@ -1795,7 +1806,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, if (references == PAGEREF_RECLAIM_CLEAN) goto keep_locked; - if (!may_enter_fs) + if (!may_enter_fs(page, sc->gfp_mask)) goto keep_locked; if (!sc->may_writepage) goto keep_locked; From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736822 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 562F2C433F5 for ; Mon, 7 Feb 2022 04:47:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D60B26B007B; Sun, 6 Feb 2022 23:47:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D104F6B007D; Sun, 6 Feb 2022 23:47:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB2396B007E; Sun, 6 Feb 2022 23:47:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0111.hostedemail.com [216.40.44.111]) by kanga.kvack.org (Postfix) with ESMTP id AD2CF6B007B for ; Sun, 6 Feb 2022 23:47:23 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 71EBB181E2535 for ; Mon, 7 Feb 2022 04:47:23 +0000 (UTC) X-FDA: 79114749966.14.39690DC Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf26.hostedemail.com (Postfix) with ESMTP id EA03C140002 for ; Mon, 7 Feb 2022 04:47:22 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id E37A21F37F; Mon, 7 Feb 2022 04:47:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209241; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jyz5xhilYTbOS68r4Pp67NqHZnxNuWBezq4E90V3yoQ=; b=GH5dWRT0aPMUsYVliRZgqPKpgvXeKjlwN9CDza0rePVus9SjojH7tbZqt0va03EAaEitzP dWBDDo11Mx2t+W2ZdnpbPmIp1VZPHmgJ/PxGlFzYQMDKdCZV5bTKDNx53P5t3G+SGC8ayE hVD79VFV7X8HK0etsqtZ+5vjEkWFI0M= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209241; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jyz5xhilYTbOS68r4Pp67NqHZnxNuWBezq4E90V3yoQ=; b=stc821HomjBUPHBRoCIjj/x3Ut0zzl24RdIlaXNuqlqm8iZeLntJnMw20MABdWmTs8Nx8g oMQ1JEc3S7l/i8AQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D3C471330E; Mon, 7 Feb 2022 04:47:18 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 8aI7I1akAGIkNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:47:18 +0000 Subject: [PATCH 05/21] MM: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916114.29374.8467297159548469912.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspam-User: nil X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: EA03C140002 X-Stat-Signature: 86a96sw8ziexwsmn3wqe57o371npz8kf Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=GH5dWRT0; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=stc821Ho; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf26.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de X-HE-Tag: 1644209242-822595 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: swap currently uses ->readpage to read swap pages. This can only request one page at a time from the filesystem, which is not most efficient. swap uses ->direct_IO for writes which while this is adequate is an inappropriate over-loading. ->direct_IO may need to had handle allocate space for holes or other details that are not relevant for swap. So this patch introduces a new address_space operation: ->swap_rw. In this patch it is used for reads, and a subsequent patch will switch writes to use it. No filesystem yet supports ->swap_rw, but that is not a problem because no filesystem actually works with filesystem-based swap. Only two filesystems set SWP_FS_OPS: - cifs sets the flag, but ->direct_IO always fails so swap cannot work. - nfs sets the flag, but ->direct_IO calls generic_write_checks() which has failed on swap files for several releases. To ensure that a NULL ->swap_rw isn't called, ->activate_swap() for both NFS and cifs are changed to fail if ->swap_rw is not set. This can be removed if/when the function is added. Future patches will restore swap-over-NFS functionality. To submit an async read with ->swap_rw() we need to allocate a structure to hold the kiocb and other details. swap_readpage() cannot handle transient failure, so we create a mempool to provide the structures. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- fs/cifs/file.c | 4 +++ fs/nfs/file.c | 4 +++ include/linux/fs.h | 1 + mm/page_io.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++----- mm/swap.h | 1 + mm/swapfile.c | 5 ++++ 6 files changed, 77 insertions(+), 6 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index fe49f1cab018..6ae5b404b04b 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -4889,6 +4889,10 @@ static int cifs_swap_activate(struct swap_info_struct *sis, cifs_dbg(FYI, "swap activate\n"); + if (!swap_file->f_mapping->a_ops->swap_rw) + /* Cannot support swap */ + return -EINVAL; + spin_lock(&inode->i_lock); blocks = inode->i_blocks; isize = inode->i_size; diff --git a/fs/nfs/file.c b/fs/nfs/file.c index d5aa55c7edb0..3dbef2c31567 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -492,6 +492,10 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); struct inode *inode = file->f_mapping->host; + if (!file->f_mapping->a_ops->swap_rw) + /* Cannot support swap */ + return -EINVAL; + spin_lock(&inode->i_lock); blocks = inode->i_blocks; isize = inode->i_size; diff --git a/include/linux/fs.h b/include/linux/fs.h index e2d892b201b0..57e3b387cb17 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -409,6 +409,7 @@ struct address_space_operations { int (*swap_activate)(struct swap_info_struct *sis, struct file *file, sector_t *span); void (*swap_deactivate)(struct file *file); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); }; extern const struct address_space_operations empty_aops; diff --git a/mm/page_io.c b/mm/page_io.c index 34b12d6f94d7..e90a3231f225 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -284,6 +284,25 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page) #define bio_associate_blkg_from_page(bio, page) do { } while (0) #endif /* CONFIG_MEMCG && CONFIG_BLK_CGROUP */ +struct swap_iocb { + struct kiocb iocb; + struct bio_vec bvec; +}; +static mempool_t *sio_pool; + +int sio_pool_init(void) +{ + if (!sio_pool) { + mempool_t *pool = mempool_create_kmalloc_pool( + SWAP_CLUSTER_MAX, sizeof(struct swap_iocb)); + if (cmpxchg(&sio_pool, NULL, pool)) + mempool_destroy(pool); + } + if (!sio_pool) + return -ENOMEM; + return 0; +} + int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func) { @@ -355,6 +374,48 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return 0; } +static void sio_read_complete(struct kiocb *iocb, long ret) +{ + struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); + struct page *page = sio->bvec.bv_page; + + if (ret != 0 && ret != PAGE_SIZE) { + SetPageError(page); + ClearPageUptodate(page); + pr_alert_ratelimited("Read-error on swap-device\n"); + } else { + SetPageUptodate(page); + count_vm_event(PSWPIN); + } + unlock_page(page); + mempool_free(sio, sio_pool); +} + +static int swap_readpage_fs(struct page *page) +{ + struct swap_info_struct *sis = page_swap_info(page); + struct file *swap_file = sis->swap_file; + struct address_space *mapping = swap_file->f_mapping; + struct iov_iter from; + struct swap_iocb *sio; + loff_t pos = page_file_offset(page); + int ret; + + sio = mempool_alloc(sio_pool, GFP_KERNEL); + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_pos = pos; + sio->iocb.ki_complete = sio_read_complete; + sio->bvec.bv_page = page; + sio->bvec.bv_len = PAGE_SIZE; + sio->bvec.bv_offset = 0; + + iov_iter_bvec(&from, READ, &sio->bvec, 1, PAGE_SIZE); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_read_complete(&sio->iocb, ret); + return ret; +} + int swap_readpage(struct page *page, bool synchronous) { struct bio *bio; @@ -381,12 +442,7 @@ int swap_readpage(struct page *page, bool synchronous) } if (data_race(sis->flags & SWP_FS_OPS)) { - struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - - ret = mapping->a_ops->readpage(swap_file, page); - if (!ret) - count_vm_event(PSWPIN); + ret = swap_readpage_fs(page); goto out; } diff --git a/mm/swap.h b/mm/swap.h index 5c676e55f288..e8ee995cf8d8 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -3,6 +3,7 @@ #include /* for bio_end_io_t */ /* linux/mm/page_io.c */ +int sio_pool_init(void); int swap_readpage(struct page *page, bool do_poll); int swap_writepage(struct page *page, struct writeback_control *wbc); void end_swap_bio_write(struct bio *bio); diff --git a/mm/swapfile.c b/mm/swapfile.c index ed6028aea8bf..c800c17bf0c8 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2350,6 +2350,11 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span) if (ret < 0) return ret; sis->flags |= SWP_ACTIVATED; + if ((sis->flags & SWP_FS_OPS) && + sio_pool_init() != 0) { + destroy_swap_extents(sis); + return -ENOMEM; + } return ret; } From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736823 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CC60C433EF for ; Mon, 7 Feb 2022 04:47:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B61E6B007D; Sun, 6 Feb 2022 23:47:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0675A6B007E; Sun, 6 Feb 2022 23:47:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E70326B0080; Sun, 6 Feb 2022 23:47:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0019.hostedemail.com [216.40.44.19]) by kanga.kvack.org (Postfix) with ESMTP id DA08D6B007D for ; Sun, 6 Feb 2022 23:47:32 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 91AC88249980 for ; Mon, 7 Feb 2022 04:47:32 +0000 (UTC) X-FDA: 79114750344.04.9DD7266 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf07.hostedemail.com (Postfix) with ESMTP id 22B2E40008 for ; Mon, 7 Feb 2022 04:47:31 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 2AF71210E8; Mon, 7 Feb 2022 04:47:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209251; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=upnGZr9DzucgqdKjTzI9I/R+ASk480fPjcWmZZiwDRY=; b=SG891wRXWq1jkKISq7BPJG1f6xCq3wxjXiDuucGwsJONxcifDBmOkjlq1mbzUkhjxRabkc lkZXaWet6Klk/Gn0fJ4tSID9+1F93jA5D66R5Ux2sPENSQN+4B+ooTidztTGFYZM+mHyxn urF3J8bLqw9W9YU4kZtdK0JhleECDK0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209251; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=upnGZr9DzucgqdKjTzI9I/R+ASk480fPjcWmZZiwDRY=; b=ahsNFsl/G2L8fbkq/4MEVV7Yp3lOr4T05+/AHEWnbLYvTCJJrGXZh0XtcCPkSmF6cTJeyq Pr6GZy9s7y8IsHDg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2E8731330E; Mon, 7 Feb 2022 04:47:27 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id X2MIN1+kAGI0NQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:47:27 +0000 Subject: [PATCH 06/21] MM: perform async writes to SWP_FS_OPS swap-space using ->swap_rw From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916115.29374.13389081720024297304.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Stat-Signature: 8wkbyhzsws187hjwyye5qojd7cfpnjab X-Rspam-User: nil Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=SG891wRX; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="ahsNFsl/"; spf=pass (imf07.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 22B2E40008 X-HE-Tag: 1644209251-481554 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch switches swap-out to SWP_FS_OPS swap-spaces to use ->swap_rw and makes the writes asynchronous, like they are for other swap spaces. To make it async we need to allocate the kiocb struct from a mempool. This may block, but won't block as long as waiting for the write to complete. At most it will wait for some previous swap IO to complete. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- mm/page_io.c | 93 +++++++++++++++++++++++++++++++++------------------------- 1 file changed, 53 insertions(+), 40 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index e90a3231f225..f391846ea82a 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -303,6 +303,57 @@ int sio_pool_init(void) return 0; } +static void sio_write_complete(struct kiocb *iocb, long ret) +{ + struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); + struct page *page = sio->bvec.bv_page; + + if (ret != PAGE_SIZE) { + /* + * In the case of swap-over-nfs, this can be a + * temporary failure if the system has limited + * memory for allocating transmit buffers. + * Mark the page dirty and avoid + * folio_rotate_reclaimable but rate-limit the + * messages but do not flag PageError like + * the normal direct-to-bio case as it could + * be temporary. + */ + set_page_dirty(page); + ClearPageReclaim(page); + pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n", + ret, page_file_offset(page)); + } else + count_vm_event(PSWPOUT); + end_page_writeback(page); + mempool_free(sio, sio_pool); +} + +static int swap_writepage_fs(struct page *page, struct writeback_control *wbc) +{ + struct swap_iocb *sio; + struct swap_info_struct *sis = page_swap_info(page); + struct file *swap_file = sis->swap_file; + struct address_space *mapping = swap_file->f_mapping; + struct iov_iter from; + int ret; + + set_page_writeback(page); + unlock_page(page); + sio = mempool_alloc(sio_pool, GFP_NOIO); + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_complete = sio_write_complete; + sio->iocb.ki_pos = page_file_offset(page); + sio->bvec.bv_page = page; + sio->bvec.bv_len = PAGE_SIZE; + sio->bvec.bv_offset = 0; + iov_iter_bvec(&from, WRITE, &sio->bvec, 1, PAGE_SIZE); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_write_complete(&sio->iocb, ret); + return ret; +} + int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func) { @@ -311,46 +362,8 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, struct swap_info_struct *sis = page_swap_info(page); VM_BUG_ON_PAGE(!PageSwapCache(page), page); - if (data_race(sis->flags & SWP_FS_OPS)) { - struct kiocb kiocb; - struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - struct bio_vec bv = { - .bv_page = page, - .bv_len = PAGE_SIZE, - .bv_offset = 0 - }; - struct iov_iter from; - - iov_iter_bvec(&from, WRITE, &bv, 1, PAGE_SIZE); - init_sync_kiocb(&kiocb, swap_file); - kiocb.ki_pos = page_file_offset(page); - - set_page_writeback(page); - unlock_page(page); - ret = mapping->a_ops->direct_IO(&kiocb, &from); - if (ret == PAGE_SIZE) { - count_vm_event(PSWPOUT); - ret = 0; - } else { - /* - * In the case of swap-over-nfs, this can be a - * temporary failure if the system has limited - * memory for allocating transmit buffers. - * Mark the page dirty and avoid - * folio_rotate_reclaimable but rate-limit the - * messages but do not flag PageError like - * the normal direct-to-bio case as it could - * be temporary. - */ - set_page_dirty(page); - ClearPageReclaim(page); - pr_err_ratelimited("Write error on dio swapfile (%llu)\n", - page_file_offset(page)); - } - end_page_writeback(page); - return ret; - } + if (data_race(sis->flags & SWP_FS_OPS)) + return swap_writepage_fs(page, wbc); ret = bdev_write_page(sis->bdev, swap_page_sector(page), page, wbc); if (!ret) { From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736824 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B457CC433EF for ; Mon, 7 Feb 2022 04:47:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 45F976B007E; Sun, 6 Feb 2022 23:47:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 410516B0080; Sun, 6 Feb 2022 23:47:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B0836B0081; Sun, 6 Feb 2022 23:47:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 1DD726B007E for ; Sun, 6 Feb 2022 23:47:40 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B145120229 for ; Mon, 7 Feb 2022 04:47:39 +0000 (UTC) X-FDA: 79114750638.02.C96C14C Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf30.hostedemail.com (Postfix) with ESMTP id 34BBD80004 for ; Mon, 7 Feb 2022 04:47:39 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 3BE93210E8; Mon, 7 Feb 2022 04:47:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209258; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MLq8cIoheOmhzxM6Ogb6L1iI5r135ThxZduFj4O+MX4=; b=D2LJ2F3Giw8Z3g8Jq66FT5PmWfId4gWMlLSJ/X14eKARqcftkNZxOXTxYGm2yhzRkclci1 fHWl82Xfq3kypIZq6aCrJ5GU9hFH7j0Jqr8d+VVURaLxgfVGG4fXidTdV7OqLLApPrdJU/ DrZ+j1X1gf128OjcUfPkxU3ugD0Qb+0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209258; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MLq8cIoheOmhzxM6Ogb6L1iI5r135ThxZduFj4O+MX4=; b=trZTPFVCVLG9BmJczeFmm71FNWRYNUZc9dSVWJRIrsVe/dkNvh5YjKSMi4JX8yychFxHJX cwdN8/N3rc5yVSAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 579E31330E; Mon, 7 Feb 2022 04:47:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id lFytBWekAGI7NQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:47:35 +0000 Subject: [PATCH 07/21] DOC: update documentation for swap_activate and swap_rw From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916116.29374.15552732854261345798.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 34BBD80004 X-Rspam-User: nil Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=D2LJ2F3G; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=trZTPFVC; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf30.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de X-Stat-Signature: twkuoxg56aus3de5euodejq6xbndce1z X-HE-Tag: 1644209259-247553 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This documentation for ->swap_activate() has been out-of-date for a long time. This patch updates it to match recent changes, and adds documentation for the associated ->swap_rw() Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- Documentation/filesystems/locking.rst | 18 ++++++++++++------ Documentation/filesystems/vfs.rst | 17 ++++++++++++----- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 3f9b1497ebb8..fbb10378d5ee 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -260,8 +260,9 @@ prototypes:: int (*launder_page)(struct page *); int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long); int (*error_remove_page)(struct address_space *, struct page *); - int (*swap_activate)(struct file *); + int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span) int (*swap_deactivate)(struct file *); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); locking rules: All except set_page_dirty and freepage may block @@ -290,6 +291,7 @@ is_partially_uptodate: yes error_remove_page: yes swap_activate: no swap_deactivate: no +swap_rw: yes, unlocks ====================== ======================== ========= =============== ->write_begin(), ->write_end() and ->readpage() may be called from @@ -392,15 +394,19 @@ cleaned, or an error value if not. Note that in order to prevent the page getting mapped back in and redirtied, it needs to be kept locked across the entire operation. -->swap_activate will be called with a non-zero argument on -files backing (non block device backed) swapfiles. A return value -of zero indicates success, in which case this file can be used for -backing swapspace. The swapspace operations will be proxied to the -address space operations. +->swap_activate() will be called to prepare the given file for swap. It +should perform any validation and preparation necessary to ensure that +writes can be performed with minimal memory allocation. It should call +add_swap_extent(), or the helper iomap_swapfile_activate(), and return +the number of extents added. If IO should be submitted through +->swap_rw(), it should set SWP_FS_OPS, otherwise IO will be submitted +directly to the block device ``sis->bdev``. ->swap_deactivate() will be called in the sys_swapoff() path after ->swap_activate() returned success. +->swap_rw will be called for swap IO if SWP_FS_OPS was set by ->swap_activate(). + file_lock_operations ==================== diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index bf5c48066fac..779d23fc7954 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -751,8 +751,9 @@ cache in your filesystem. The following members are defined: unsigned long); void (*is_dirty_writeback) (struct page *, bool *, bool *); int (*error_remove_page) (struct mapping *mapping, struct page *page); - int (*swap_activate)(struct file *); + int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span) int (*swap_deactivate)(struct file *); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); }; ``writepage`` @@ -959,15 +960,21 @@ cache in your filesystem. The following members are defined: unless you have them locked or reference counts increased. ``swap_activate`` - Called when swapon is used on a file to allocate space if - necessary and pin the block lookup information in memory. A - return value of zero indicates success, in which case this file - can be used to back swapspace. + + Called to prepare the given file for swap. It should perform + any validation and preparation necessary to ensure that writes + can be performed with minimal memory allocation. It should call + add_swap_extent(), or the helper iomap_swapfile_activate(), and + return the number of extents added. If IO should be submitted + through ->swap_rw(), it should set SWP_FS_OPS, otherwise IO will + be submitted directly to the block device ``sis->bdev``. ``swap_deactivate`` Called during swapoff on files where swap_activate was successful. +``swap_rw`` + Called to read or write swap pages when SWP_FS_OPS is set. The File Object =============== From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736825 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F4E7C433F5 for ; Mon, 7 Feb 2022 04:47:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2DA016B0080; Sun, 6 Feb 2022 23:47:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 288C06B0081; Sun, 6 Feb 2022 23:47:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1027F6B0082; Sun, 6 Feb 2022 23:47:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0171.hostedemail.com [216.40.44.171]) by kanga.kvack.org (Postfix) with ESMTP id 014096B0080 for ; Sun, 6 Feb 2022 23:47:47 -0500 (EST) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id BD2B4181E2535 for ; Mon, 7 Feb 2022 04:47:47 +0000 (UTC) X-FDA: 79114750974.31.C1B2B1C Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf19.hostedemail.com (Postfix) with ESMTP id 30ABE1A0003 for ; Mon, 7 Feb 2022 04:47:47 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 29DE0210E2; Mon, 7 Feb 2022 04:47:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209266; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sf0NsMjpJ6R5/YWWRrfEUVzU3CgvElzo/eR/uy6xoiA=; b=1JswmUAIp58wbLKcujZDErWqTMypvdyYgTvqcaXfFlupDLrHi7/6HKdmDO8qUXq+oRAYFs fjHnFRSBiOsusTaIdclLcAGHHIqD9Xq09J7Yxcmlllcc9kvvboefYL7q/a3tDe5Di2Jz7q EPkOUAvVGlGIZFKUF1YcSWlnTbc0WhI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209266; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sf0NsMjpJ6R5/YWWRrfEUVzU3CgvElzo/eR/uy6xoiA=; b=KhgXhjYvSGuUtS+JMbfPXB/VPY2sv+UP2JagXVrJ/r4Yy7726NDRvT7pWlTrEiYBuG+rz1 BG8b9tsdmebj55Dg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0CA7D1330E; Mon, 7 Feb 2022 04:47:42 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 6ptJL26kAGJINQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:47:42 +0000 Subject: [PATCH 08/21] MM: submit multipage reads for SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916116.29374.1808343288510917899.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 30ABE1A0003 X-Stat-Signature: e7he9jmsdsea6o7gzakuy8d16ninhbn8 Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=1JswmUAI; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=KhgXhjYv; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf19.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de X-Rspam-User: nil X-HE-Tag: 1644209267-394528 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: swap_readpage() is given one page at a time, but may be called repeatedly in succession. For block-device swap-space, the blk_plug functionality allows the multiple pages to be combined together at lower layers. That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page reads. With this patch we pass in a pointer-to-pointer when swap_readpage can store state between calls - much like the effect of blk_plug. After calling swap_readpage() some number of times, the state will be passed to swap_read_unplug() which can submit the combined request. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- mm/madvise.c | 8 +++- mm/memory.c | 2 + mm/page_io.c | 102 ++++++++++++++++++++++++++++++++++++------------------- mm/swap.h | 17 ++++++++- mm/swap_state.c | 20 ++++++++--- 5 files changed, 102 insertions(+), 47 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 1ee4b7583379..2b1ab30af141 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -225,6 +225,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, pte_t *orig_pte; struct vm_area_struct *vma = walk->private; unsigned long index; + struct swap_iocb *splug = NULL; if (pmd_none_or_trans_huge_or_clear_bad(pmd)) return 0; @@ -246,10 +247,11 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, continue; page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, - vma, index, false); + vma, index, false, &splug); if (page) put_page(page); } + swap_read_unplug(splug); return 0; } @@ -265,6 +267,7 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma, XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start)); pgoff_t end_index = linear_page_index(vma, end + PAGE_SIZE - 1); struct page *page; + struct swap_iocb *splug = NULL; rcu_read_lock(); xas_for_each(&xas, page, end_index) { @@ -277,13 +280,14 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma, swap = radix_to_swp_entry(page); page = read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE, - NULL, 0, false); + NULL, 0, false, &splug); if (page) put_page(page); rcu_read_lock(); } rcu_read_unlock(); + swap_read_unplug(splug); lru_add_drain(); /* Push any new pages onto the LRU now */ } diff --git a/mm/memory.c b/mm/memory.c index d25372340107..8bd18c54eaa4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3559,7 +3559,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* To provide entry to swap_readpage() */ set_page_private(page, entry.val); - swap_readpage(page, true); + swap_readpage(page, true, NULL); set_page_private(page, 0); } } else { diff --git a/mm/page_io.c b/mm/page_io.c index f391846ea82a..fc82e8750e9b 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -286,7 +286,8 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page) struct swap_iocb { struct kiocb iocb; - struct bio_vec bvec; + struct bio_vec bvec[SWAP_CLUSTER_MAX]; + int pages; }; static mempool_t *sio_pool; @@ -306,7 +307,7 @@ int sio_pool_init(void) static void sio_write_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); - struct page *page = sio->bvec.bv_page; + struct page *page = sio->bvec[0].bv_page; if (ret != PAGE_SIZE) { /* @@ -344,10 +345,10 @@ static int swap_writepage_fs(struct page *page, struct writeback_control *wbc) init_sync_kiocb(&sio->iocb, swap_file); sio->iocb.ki_complete = sio_write_complete; sio->iocb.ki_pos = page_file_offset(page); - sio->bvec.bv_page = page; - sio->bvec.bv_len = PAGE_SIZE; - sio->bvec.bv_offset = 0; - iov_iter_bvec(&from, WRITE, &sio->bvec, 1, PAGE_SIZE); + sio->bvec[0].bv_page = page; + sio->bvec[0].bv_len = PAGE_SIZE; + sio->bvec[0].bv_offset = 0; + iov_iter_bvec(&from, WRITE, &sio->bvec[0], 1, PAGE_SIZE); ret = mapping->a_ops->swap_rw(&sio->iocb, &from); if (ret != -EIOCBQUEUED) sio_write_complete(&sio->iocb, ret); @@ -390,46 +391,64 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, static void sio_read_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); - struct page *page = sio->bvec.bv_page; + int p; - if (ret != 0 && ret != PAGE_SIZE) { - SetPageError(page); - ClearPageUptodate(page); - pr_alert_ratelimited("Read-error on swap-device\n"); + if (ret == PAGE_SIZE * sio->pages) { + for (p = 0; p < sio->pages; p++) { + struct page *page = sio->bvec[p].bv_page; + SetPageUptodate(page); + unlock_page(page); + } + count_vm_events(PSWPIN, sio->pages); } else { - SetPageUptodate(page); - count_vm_event(PSWPIN); + for (p = 0; p < sio->pages; p++) { + struct page *page = sio->bvec[p].bv_page; + SetPageError(page); + ClearPageUptodate(page); + unlock_page(page); + } + pr_alert_ratelimited("Read-error on swap-device\n"); } - unlock_page(page); mempool_free(sio, sio_pool); } -static int swap_readpage_fs(struct page *page) +static void swap_readpage_fs(struct page *page, + struct swap_iocb **plug) { struct swap_info_struct *sis = page_swap_info(page); - struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - struct iov_iter from; - struct swap_iocb *sio; + struct swap_iocb *sio = NULL; loff_t pos = page_file_offset(page); - int ret; - - sio = mempool_alloc(sio_pool, GFP_KERNEL); - init_sync_kiocb(&sio->iocb, swap_file); - sio->iocb.ki_pos = pos; - sio->iocb.ki_complete = sio_read_complete; - sio->bvec.bv_page = page; - sio->bvec.bv_len = PAGE_SIZE; - sio->bvec.bv_offset = 0; - iov_iter_bvec(&from, READ, &sio->bvec, 1, PAGE_SIZE); - ret = mapping->a_ops->swap_rw(&sio->iocb, &from); - if (ret != -EIOCBQUEUED) - sio_read_complete(&sio->iocb, ret); - return ret; + if (plug) + sio = *plug; + if (sio) { + if (sio->iocb.ki_filp != sis->swap_file || + sio->iocb.ki_pos + sio->pages * PAGE_SIZE != pos) { + swap_read_unplug(sio); + sio = NULL; + } + } + if (!sio) { + sio = mempool_alloc(sio_pool, GFP_KERNEL); + init_sync_kiocb(&sio->iocb, sis->swap_file); + sio->iocb.ki_pos = pos; + sio->iocb.ki_complete = sio_read_complete; + sio->pages = 0; + } + sio->bvec[sio->pages].bv_page = page; + sio->bvec[sio->pages].bv_len = PAGE_SIZE; + sio->bvec[sio->pages].bv_offset = 0; + sio->pages += 1; + if (sio->pages == ARRAY_SIZE(sio->bvec) || !plug) { + swap_read_unplug(sio); + sio = NULL; + } + if (plug) + *plug = sio; } -int swap_readpage(struct page *page, bool synchronous) +int swap_readpage(struct page *page, bool synchronous, + struct swap_iocb **plug) { struct bio *bio; int ret = 0; @@ -455,7 +474,7 @@ int swap_readpage(struct page *page, bool synchronous) } if (data_race(sis->flags & SWP_FS_OPS)) { - ret = swap_readpage_fs(page); + swap_readpage_fs(page, plug); goto out; } @@ -507,3 +526,16 @@ int swap_readpage(struct page *page, bool synchronous) delayacct_swapin_end(); return ret; } + +void __swap_read_unplug(struct swap_iocb *sio) +{ + struct iov_iter from; + struct address_space *mapping = sio->iocb.ki_filp->f_mapping; + int ret; + + iov_iter_bvec(&from, READ, sio->bvec, sio->pages, + PAGE_SIZE * sio->pages); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_read_complete(&sio->iocb, ret); +} diff --git a/mm/swap.h b/mm/swap.h index e8ee995cf8d8..7aab4c82e2d0 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -4,7 +4,15 @@ /* linux/mm/page_io.c */ int sio_pool_init(void); -int swap_readpage(struct page *page, bool do_poll); +struct swap_iocb; +int swap_readpage(struct page *page, bool do_poll, + struct swap_iocb **plug); +void __swap_read_unplug(struct swap_iocb *plug); +static inline void swap_read_unplug(struct swap_iocb *plug) +{ + if (unlikely(plug)) + __swap_read_unplug(plug); +} int swap_writepage(struct page *page, struct writeback_control *wbc); void end_swap_bio_write(struct bio *bio); int __swap_writepage(struct page *page, struct writeback_control *wbc, @@ -38,7 +46,8 @@ struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); struct page *read_swap_cache_async(swp_entry_t, gfp_t, struct vm_area_struct *vma, unsigned long addr, - bool do_poll); + bool do_poll, + struct swap_iocb **plug); struct page *__read_swap_cache_async(swp_entry_t, gfp_t, struct vm_area_struct *vma, unsigned long addr, @@ -53,7 +62,9 @@ static inline unsigned int page_swap_flags(struct page *page) return page_swap_info(page)->flags; } #else /* CONFIG_SWAP */ -static inline int swap_readpage(struct page *page, bool do_poll) +struct swap_iocb; +static inline int swap_readpage(struct page *page, bool do_poll, + struct swap_iocb **plug) { return 0; } diff --git a/mm/swap_state.c b/mm/swap_state.c index 514b86b05488..c84779e2518b 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -520,14 +520,16 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * the swap entry is no longer in use. */ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, - struct vm_area_struct *vma, unsigned long addr, bool do_poll) + struct vm_area_struct *vma, + unsigned long addr, bool do_poll, + struct swap_iocb **plug) { bool page_was_allocated; struct page *retpage = __read_swap_cache_async(entry, gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(retpage, do_poll); + swap_readpage(retpage, do_poll, plug); return retpage; } @@ -621,6 +623,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, unsigned long mask; struct swap_info_struct *si = swp_swap_info(entry); struct blk_plug plug; + struct swap_iocb *splug = NULL; bool do_poll = true, page_allocated; struct vm_area_struct *vma = vmf->vma; unsigned long addr = vmf->address; @@ -647,7 +650,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); + swap_readpage(page, false, &splug); if (offset != entry_offset) { SetPageReadahead(page); count_vm_event(SWAP_RA); @@ -656,10 +659,12 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, put_page(page); } blk_finish_plug(&plug); + swap_read_unplug(splug); lru_add_drain(); /* Push any new pages onto the LRU now */ skip: - return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll); + /* The page was likely read above, so no need for plugging here */ + return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll, NULL); } int init_swap_address_space(unsigned int type, unsigned long nr_pages) @@ -790,6 +795,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, struct vm_fault *vmf) { struct blk_plug plug; + struct swap_iocb *splug = NULL; struct vm_area_struct *vma = vmf->vma; struct page *page; pte_t *pte, pentry; @@ -820,7 +826,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); + swap_readpage(page, false, &splug); if (i != ra_info.offset) { SetPageReadahead(page); count_vm_event(SWAP_RA); @@ -829,10 +835,12 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, put_page(page); } blk_finish_plug(&plug); + swap_read_unplug(splug); lru_add_drain(); skip: + /* The page was likely read above, so no need for plugging here */ return read_swap_cache_async(fentry, gfp_mask, vma, vmf->address, - ra_info.win == 1); + ra_info.win == 1, NULL); } /** From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736826 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F27B2C433F5 for ; Mon, 7 Feb 2022 04:47:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80A406B0072; Sun, 6 Feb 2022 23:47:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B85E6B0081; Sun, 6 Feb 2022 23:47:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 659C06B0082; Sun, 6 Feb 2022 23:47:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0221.hostedemail.com [216.40.44.221]) by kanga.kvack.org (Postfix) with ESMTP id 578706B0072 for ; Sun, 6 Feb 2022 23:47:54 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 164E5181E2535 for ; Mon, 7 Feb 2022 04:47:54 +0000 (UTC) X-FDA: 79114751268.16.B939F1E Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf09.hostedemail.com (Postfix) with ESMTP id 94FD2140007 for ; Mon, 7 Feb 2022 04:47:53 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9E792210E2; Mon, 7 Feb 2022 04:47:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209272; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KjaFIy8Q0CoV7YsqAR88nvXmgR16X2+0+fdinEiWJPo=; b=adb/LawklSNgykGRau7cL8ffvn90r8M+5rHNBv/xzkI133R+Icq7tC6ZvRTHQ84Dh9w3wc Nbt/QTXVV7tiLh9KggbAPXldK5dKdp+8NX3m4tYL/ebupzpFPU8Ux71WosKm4dbGgw4gg9 4d1WJ7siLi9SRhm8Gbd72A9zorJ9Bgs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209272; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KjaFIy8Q0CoV7YsqAR88nvXmgR16X2+0+fdinEiWJPo=; b=KfOaDkPtY3R8ihVkoBneX3ddbKCm/m7jSAVBXUm/bmOGZbhntrEUO3uZuNlsxsog2RkKel lMmYlgvPygFJwGDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id BDE5F1330E; Mon, 7 Feb 2022 04:47:49 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id fEkZH3WkAGJSNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:47:49 +0000 Subject: [PATCH 09/21] MM: submit multipage write for SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916117.29374.17897957849434348102.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspam-User: nil X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 94FD2140007 X-Stat-Signature: npska5q3wn5ke85xzi498yzt8xcydxwr Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="adb/Lawk"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=KfOaDkPt; spf=pass (imf09.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1644209273-689627 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: swap_writepage() is given one page at a time, but may be called repeatedly in succession. For block-device swapspace, the blk_plug functionality allows the multiple pages to be combined together at lower layers. That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page reads. With this patch we pass a pointer-to-pointer via the wbc. swap_writepage can store state between calls - much like the pointer passed explicitly to swap_readpage. After calling swap_writepage() some number of times, the state will be passed to swap_write_unplug() which can submit the combined request. Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig --- include/linux/writeback.h | 7 ++++ mm/page_io.c | 78 ++++++++++++++++++++++++++++++++------------- mm/swap.h | 4 ++ mm/vmscan.c | 9 ++++- 4 files changed, 74 insertions(+), 24 deletions(-) diff --git a/include/linux/writeback.h b/include/linux/writeback.h index fec248ab1fec..32b35f21cb97 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -80,6 +80,13 @@ struct writeback_control { unsigned punt_to_cgroup:1; /* cgrp punting, see __REQ_CGROUP_PUNT */ + /* To enable batching of swap writes to non-block-device backends, + * "plug" can be set point to a 'struct swap_iocb *'. When all swap + * writes have been submitted, if with swap_iocb is not NULL, + * swap_write_unplug() should be called. + */ + struct swap_iocb **swap_plug; + #ifdef CONFIG_CGROUP_WRITEBACK struct bdi_writeback *wb; /* wb this writeback is issued under */ struct inode *inode; /* inode being written out */ diff --git a/mm/page_io.c b/mm/page_io.c index fc82e8750e9b..7684a8d81dcd 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -308,8 +308,9 @@ static void sio_write_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); struct page *page = sio->bvec[0].bv_page; + int p; - if (ret != PAGE_SIZE) { + if (ret != PAGE_SIZE * sio->pages) { /* * In the case of swap-over-nfs, this can be a * temporary failure if the system has limited @@ -320,43 +321,63 @@ static void sio_write_complete(struct kiocb *iocb, long ret) * the normal direct-to-bio case as it could * be temporary. */ - set_page_dirty(page); - ClearPageReclaim(page); pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n", ret, page_file_offset(page)); + for (p = 0; p < sio->pages; p++) { + page = sio->bvec[p].bv_page; + set_page_dirty(page); + ClearPageReclaim(page); + } } else - count_vm_event(PSWPOUT); - end_page_writeback(page); + count_vm_events(PSWPOUT, sio->pages); + + for (p = 0; p < sio->pages; p++) + end_page_writeback(sio->bvec[p].bv_page); + mempool_free(sio, sio_pool); } static int swap_writepage_fs(struct page *page, struct writeback_control *wbc) { - struct swap_iocb *sio; + struct swap_iocb *sio = NULL; struct swap_info_struct *sis = page_swap_info(page); struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - struct iov_iter from; - int ret; + loff_t pos = page_file_offset(page); set_page_writeback(page); unlock_page(page); - sio = mempool_alloc(sio_pool, GFP_NOIO); - init_sync_kiocb(&sio->iocb, swap_file); - sio->iocb.ki_complete = sio_write_complete; - sio->iocb.ki_pos = page_file_offset(page); - sio->bvec[0].bv_page = page; - sio->bvec[0].bv_len = PAGE_SIZE; - sio->bvec[0].bv_offset = 0; - iov_iter_bvec(&from, WRITE, &sio->bvec[0], 1, PAGE_SIZE); - ret = mapping->a_ops->swap_rw(&sio->iocb, &from); - if (ret != -EIOCBQUEUED) - sio_write_complete(&sio->iocb, ret); - return ret; + if (wbc->swap_plug) + sio = *wbc->swap_plug; + if (sio) { + if (sio->iocb.ki_filp != swap_file || + sio->iocb.ki_pos + sio->pages * PAGE_SIZE != pos) { + swap_write_unplug(sio); + sio = NULL; + } + } + if (!sio) { + sio = mempool_alloc(sio_pool, GFP_NOIO); + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_complete = sio_write_complete; + sio->iocb.ki_pos = pos; + sio->pages = 0; + } + sio->bvec[sio->pages].bv_page = page; + sio->bvec[sio->pages].bv_len = PAGE_SIZE; + sio->bvec[sio->pages].bv_offset = 0; + sio->pages += 1; + if (sio->pages == ARRAY_SIZE(sio->bvec) || !wbc->swap_plug) { + swap_write_unplug(sio); + sio = NULL; + } + if (wbc->swap_plug) + *wbc->swap_plug = sio; + + return 0; } int __swap_writepage(struct page *page, struct writeback_control *wbc, - bio_end_io_t end_write_func) + bio_end_io_t end_write_func) { struct bio *bio; int ret; @@ -388,6 +409,19 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return 0; } +void swap_write_unplug(struct swap_iocb *sio) +{ + struct iov_iter from; + struct address_space *mapping = sio->iocb.ki_filp->f_mapping; + int ret; + + iov_iter_bvec(&from, WRITE, sio->bvec, sio->pages, + PAGE_SIZE * sio->pages); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_write_complete(&sio->iocb, ret); +} + static void sio_read_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); diff --git a/mm/swap.h b/mm/swap.h index 7aab4c82e2d0..58f28e94ba56 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -13,6 +13,7 @@ static inline void swap_read_unplug(struct swap_iocb *plug) if (unlikely(plug)) __swap_read_unplug(plug); } +void swap_write_unplug(struct swap_iocb *sio); int swap_writepage(struct page *page, struct writeback_control *wbc); void end_swap_bio_write(struct bio *bio); int __swap_writepage(struct page *page, struct writeback_control *wbc, @@ -68,6 +69,9 @@ static inline int swap_readpage(struct page *page, bool do_poll, { return 0; } +static inline void swap_write_unplug(struct swap_iocb *sio) +{ +} static inline struct address_space *swap_address_space(swp_entry_t entry) { diff --git a/mm/vmscan.c b/mm/vmscan.c index ad5026d06aa8..572627412e7d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1164,7 +1164,8 @@ typedef enum { * pageout is called by shrink_page_list() for each dirty page. * Calls ->writepage(). */ -static pageout_t pageout(struct page *page, struct address_space *mapping) +static pageout_t pageout(struct page *page, struct address_space *mapping, + struct swap_iocb **plug) { /* * If the page is dirty, only perform writeback if that write @@ -1211,6 +1212,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping) .range_start = 0, .range_end = LLONG_MAX, .for_reclaim = 1, + .swap_plug = plug, }; SetPageReclaim(page); @@ -1537,6 +1539,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, unsigned int nr_reclaimed = 0; unsigned int pgactivate = 0; bool do_demote_pass; + struct swap_iocb *plug = NULL; memset(stat, 0, sizeof(*stat)); cond_resched(); @@ -1817,7 +1820,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, * starts and then write it out here. */ try_to_unmap_flush_dirty(); - switch (pageout(page, mapping)) { + switch (pageout(page, mapping, &plug)) { case PAGE_KEEP: goto keep_locked; case PAGE_ACTIVATE: @@ -1971,6 +1974,8 @@ static unsigned int shrink_page_list(struct list_head *page_list, list_splice(&ret_pages, page_list); count_vm_events(PGACTIVATE, pgactivate); + if (plug) + swap_write_unplug(plug); return nr_reclaimed; } From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736827 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD52BC433F5 for ; Mon, 7 Feb 2022 04:48:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 793A36B0073; Sun, 6 Feb 2022 23:48:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 746786B0081; Sun, 6 Feb 2022 23:48:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E4576B0082; Sun, 6 Feb 2022 23:48:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 509D26B0073 for ; Sun, 6 Feb 2022 23:48:01 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1994E20229 for ; Mon, 7 Feb 2022 04:48:01 +0000 (UTC) X-FDA: 79114751562.03.C858669 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf21.hostedemail.com (Postfix) with ESMTP id 8C40F1C000C for ; Mon, 7 Feb 2022 04:48:00 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 901131F37E; Mon, 7 Feb 2022 04:47:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209279; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hJrwVzrGaKWoGSMvm+rLPIL0IUUFJm93Y22uoJC5Nkk=; b=lReWhSSnDcobM6ycj6fui/XTHl4Ovj39cam+75y7vRoTbzMyR4h9RUIMIILSMtCgk6nsD7 zEjh7PyKkEwOyDXeU2ElKQKVs5vbKhZCDndj0ocR3NCGAfSVhx0iEzKaj7z7qgHzhvenST 0gCxxwmiZ7pxS4Lxe58I1c65QPOvQsk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209279; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hJrwVzrGaKWoGSMvm+rLPIL0IUUFJm93Y22uoJC5Nkk=; b=ja4EZk1+1GDm/63DODmOF2vuK1I/biABJ7PqDQO7kF3FkSMkOalgJABNRx+ksjlqqAHMX2 tBbunbIPicmkmtBA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 95D1B1330E; Mon, 7 Feb 2022 04:47:56 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 4NiLFHykAGJWNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:47:56 +0000 Subject: [PATCH 10/21] VFS: Add FMODE_CAN_ODIRECT file flag From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916118.29374.18393494885904268956.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 8C40F1C000C X-Stat-Signature: utrycnky1o95ecauigqfgx39xodf46sb Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=lReWhSSn; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=ja4EZk1+; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf21.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de X-Rspam-User: nil X-HE-Tag: 1644209280-699909 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently various places test if direct IO is possible on a file by checking for the existence of the direct_IO address space operation. This is a poor choice, as the direct_IO operation may not be used - it is only used if the generic_file_*_iter functions are called for direct IO and some filesystems - particularly NFS - don't do this. Instead, introduce a new f_mode flag: FMODE_CAN_ODIRECT and change the various places to check this (avoiding pointer dereferences). do_dentry_open() will set this flag if ->direct_IO is present, so filesystems do not need to be changed. NFS *is* changed, to set the flag explicitly and discard the direct_IO entry in the address_space_operations for files. Other filesystems which currently use noop_direct_IO could usefully be changed to set this flag instead. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- drivers/block/loop.c | 4 ++-- fs/fcntl.c | 9 ++++----- fs/nfs/file.c | 3 ++- fs/open.c | 9 ++++----- fs/overlayfs/file.c | 13 ++++--------- include/linux/fs.h | 3 +++ 6 files changed, 19 insertions(+), 22 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 01cbbfc4e9e2..a2609dd79370 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -184,8 +184,8 @@ static void __loop_update_dio(struct loop_device *lo, bool dio) */ if (dio) { if (queue_logical_block_size(lo->lo_queue) >= sb_bsize && - !(lo->lo_offset & dio_align) && - mapping->a_ops->direct_IO) + !(lo->lo_offset & dio_align) && + (file->f_mode & FMODE_CAN_ODIRECT)) use_dio = true; else use_dio = false; diff --git a/fs/fcntl.c b/fs/fcntl.c index 9c6c6a3e2de5..11e665242a76 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -56,11 +56,10 @@ static int setfl(int fd, struct file * filp, unsigned long arg) arg |= O_NONBLOCK; /* Pipe packetized mode is controlled by O_DIRECT flag */ - if (!S_ISFIFO(inode->i_mode) && (arg & O_DIRECT)) { - if (!filp->f_mapping || !filp->f_mapping->a_ops || - !filp->f_mapping->a_ops->direct_IO) - return -EINVAL; - } + if (!S_ISFIFO(inode->i_mode) && + (arg & O_DIRECT) && + !(filp->f_mode & FMODE_CAN_ODIRECT)) + return -EINVAL; if (filp->f_op->check_flags) error = filp->f_op->check_flags(arg); diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 3dbef2c31567..9e2def045111 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -74,6 +74,8 @@ nfs_file_open(struct inode *inode, struct file *filp) return res; res = nfs_open(inode, filp); + if (res == 0) + filp->f_mode |= FMODE_CAN_ODIRECT; return res; } @@ -535,7 +537,6 @@ const struct address_space_operations nfs_file_aops = { .write_end = nfs_write_end, .invalidatepage = nfs_invalidate_page, .releasepage = nfs_release_page, - .direct_IO = nfs_direct_IO, #ifdef CONFIG_MIGRATION .migratepage = nfs_migrate_page, #endif diff --git a/fs/open.c b/fs/open.c index 9ff2f621b760..76ddf9014499 100644 --- a/fs/open.c +++ b/fs/open.c @@ -834,17 +834,16 @@ static int do_dentry_open(struct file *f, if ((f->f_mode & FMODE_WRITE) && likely(f->f_op->write || f->f_op->write_iter)) f->f_mode |= FMODE_CAN_WRITE; + if (f->f_mapping->a_ops && f->f_mapping->a_ops->direct_IO) + f->f_mode |= FMODE_CAN_ODIRECT; f->f_write_hint = WRITE_LIFE_NOT_SET; f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC); file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping); - /* NB: we're sure to have correct a_ops only after f_op->open */ - if (f->f_flags & O_DIRECT) { - if (!f->f_mapping->a_ops || !f->f_mapping->a_ops->direct_IO) - return -EINVAL; - } + if ((f->f_flags & O_DIRECT) && !(f->f_mode & FMODE_CAN_ODIRECT)) + return -EINVAL; /* * XXX: Huge page cache doesn't support writing yet. Drop all page diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index fa125feed0ff..9d69b4dbb8c4 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -82,11 +82,8 @@ static int ovl_change_flags(struct file *file, unsigned int flags) if (((flags ^ file->f_flags) & O_APPEND) && IS_APPEND(inode)) return -EPERM; - if (flags & O_DIRECT) { - if (!file->f_mapping->a_ops || - !file->f_mapping->a_ops->direct_IO) - return -EINVAL; - } + if ((flags & O_DIRECT) && !(file->f_mode & FMODE_CAN_ODIRECT)) + return -EINVAL; if (file->f_op->check_flags) { err = file->f_op->check_flags(flags); @@ -306,8 +303,7 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter) ret = -EINVAL; if (iocb->ki_flags & IOCB_DIRECT && - (!real.file->f_mapping->a_ops || - !real.file->f_mapping->a_ops->direct_IO)) + !(real.file->f_mode & FMODE_CAN_ODIRECT)) goto out_fdput; old_cred = ovl_override_creds(file_inode(file)->i_sb); @@ -367,8 +363,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) ret = -EINVAL; if (iocb->ki_flags & IOCB_DIRECT && - (!real.file->f_mapping->a_ops || - !real.file->f_mapping->a_ops->direct_IO)) + !(real.file->f_mode & FMODE_CAN_ODIRECT)) goto out_fdput; if (!ovl_should_sync(OVL_FS(inode->i_sb))) diff --git a/include/linux/fs.h b/include/linux/fs.h index 57e3b387cb17..c34c53267415 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -161,6 +161,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* File is stream-like */ #define FMODE_STREAM ((__force fmode_t)0x200000) +/* File supports DIRECT IO */ +#define FMODE_CAN_ODIRECT ((__force fmode_t)0x400000) + /* File was opened by fanotify and shouldn't generate fanotify events */ #define FMODE_NONOTIFY ((__force fmode_t)0x4000000) From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736828 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CEC1DC433EF for ; Mon, 7 Feb 2022 04:48:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E8586B0074; Sun, 6 Feb 2022 23:48:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5957F6B0081; Sun, 6 Feb 2022 23:48:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45D4D6B0082; Sun, 6 Feb 2022 23:48:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0253.hostedemail.com [216.40.44.253]) by kanga.kvack.org (Postfix) with ESMTP id 396D46B0074 for ; Sun, 6 Feb 2022 23:48:08 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 02EE788489 for ; Mon, 7 Feb 2022 04:48:08 +0000 (UTC) X-FDA: 79114751856.29.D328552 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf15.hostedemail.com (Postfix) with ESMTP id 9812CA0003 for ; Mon, 7 Feb 2022 04:48:07 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 980B4210E8; Mon, 7 Feb 2022 04:48:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209286; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2w8E8+D8DnDn4oux4W6cKi7wJrd0JGZITu8BgcqO5/k=; b=iIaeyp+u+6up1G0hpe0ns8cxnMeE6N4td3CtAWY+xZIGV+UKST1UGVia/hjHjG3qOIcUAl giG/fcb7PceuAdwQKLhekiYjp3BTPoUdQH5hMxRU7sqNfqk/ybnfaJ4iMjn7mj5MFX1fWO 7Jh6V+gwtrKOjdzEU1rDx5podtWYG/E= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209286; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2w8E8+D8DnDn4oux4W6cKi7wJrd0JGZITu8BgcqO5/k=; b=sb1QOGj4V3V8k41j+kPjQWi6J0FNHQBPDe3wYFw+hB7Sa78/4oiATv1segw7lCSXrSSAJV f8j/FpEi2KgcDMDw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 85BCC1330E; Mon, 7 Feb 2022 04:48:03 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id QQM1EIOkAGJcNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:48:03 +0000 Subject: [PATCH 11/21] NFS: remove IS_SWAPFILE hack From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916118.29374.2111079497808369007.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=iIaeyp+u; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=sb1QOGj4; spf=pass (imf15.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspam-User: nil X-Rspamd-Queue-Id: 9812CA0003 X-Stat-Signature: o55b7oaru1qtg1qkxxkhab6g6fbsus9r X-Rspamd-Server: rspam12 X-HE-Tag: 1644209287-953614 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This code is pointless as IS_SWAPFILE is always defined. So remove it. Suggested-by: Mark Hemment Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- fs/nfs/file.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 9e2def045111..4d4750738aeb 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -44,11 +44,6 @@ static const struct vm_operations_struct nfs_file_vm_ops; -/* Hack for future NFS swap support */ -#ifndef IS_SWAPFILE -# define IS_SWAPFILE(inode) (0) -#endif - int nfs_check_flags(int flags) { if ((flags & (O_APPEND | O_DIRECT)) == (O_APPEND | O_DIRECT)) From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736829 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37A67C433EF for ; Mon, 7 Feb 2022 04:48:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C27BE6B0075; Sun, 6 Feb 2022 23:48:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BD7886B0081; Sun, 6 Feb 2022 23:48:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A9F226B0082; Sun, 6 Feb 2022 23:48:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id 9D8F76B0075 for ; Sun, 6 Feb 2022 23:48:14 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5D5F588489 for ; Mon, 7 Feb 2022 04:48:14 +0000 (UTC) X-FDA: 79114752108.04.F2D4DF0 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf02.hostedemail.com (Postfix) with ESMTP id CF48580005 for ; Mon, 7 Feb 2022 04:48:13 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D2BA6210E8; Mon, 7 Feb 2022 04:48:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209292; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ViWGm/depDGf235ykZcrvlwHPhzYZ1IOzNywgeuJlCs=; b=dEj/ZECgQzPmVNNwNtLhRwIK+t+7MZfIgofMU57KXuIxPsebs5TmJ+yBisbfSkJ1Lthv13 WGxGwr5R4TqIlYYAhIz1CsdYyp+aE9q3/2eQvTm0XbIF4xjDADjB0h2337rxl00eMcgeg/ EywU58RZdtc7KMmvbgUOuYuHFQJUK2Q= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209292; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ViWGm/depDGf235ykZcrvlwHPhzYZ1IOzNywgeuJlCs=; b=LB4uxgey7EDf4HC0h1f45bTAwNW+UouWDbatH0XJLCUd1qURVb63ibV21W/iR7wynleyiL nIOTeQVWLnjvIqCQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id F0D7B1330E; Mon, 7 Feb 2022 04:48:09 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id O2hbK4mkAGJiNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:48:09 +0000 Subject: [PATCH 12/21] SUNRPC/call_alloc: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916119.29374.4983888648335050333.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: CF48580005 X-Rspam-User: nil Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="dEj/ZECg"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=LB4uxgey; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf02.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de X-Stat-Signature: q76de63z5kybt3c6ufd4exmn7xrfir4e X-HE-Tag: 1644209293-530459 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. rpc_malloc() can block, and this might cause deadlocks. So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if blocking is acceptable. Signed-off-by: NeilBrown --- net/sunrpc/sched.c | 4 +++- net/sunrpc/xprtrdma/transport.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index e2c835482791..d5b6e897f5a5 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -1023,8 +1023,10 @@ int rpc_malloc(struct rpc_task *task) struct rpc_buffer *buf; gfp_t gfp = GFP_NOFS; + if (RPC_IS_ASYNC(task)) + gfp = GFP_NOWAIT | __GFP_NOWARN; if (RPC_IS_SWAPPER(task)) - gfp = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN; + gfp |= __GFP_MEMALLOC; size += sizeof(struct rpc_buffer); if (size <= RPC_BUFFER_MAXSIZE) diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 42e375dbdadb..5714bf880e95 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -570,8 +570,10 @@ xprt_rdma_allocate(struct rpc_task *task) gfp_t flags; flags = RPCRDMA_DEF_GFP; + if (RPC_IS_ASYNC(task)) + flags = GFP_NOWAIT | __GFP_NOWARN; if (RPC_IS_SWAPPER(task)) - flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN; + flags |= __GFP_MEMALLOC; if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize, flags)) From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736830 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C77F2C433F5 for ; Mon, 7 Feb 2022 04:48:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 60DB56B0078; Sun, 6 Feb 2022 23:48:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5BE776B0081; Sun, 6 Feb 2022 23:48:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45CDE6B0082; Sun, 6 Feb 2022 23:48:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0249.hostedemail.com [216.40.44.249]) by kanga.kvack.org (Postfix) with ESMTP id 39A0D6B0078 for ; Sun, 6 Feb 2022 23:48:21 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E56528249980 for ; Mon, 7 Feb 2022 04:48:20 +0000 (UTC) X-FDA: 79114752360.09.2B4CA4C Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf29.hostedemail.com (Postfix) with ESMTP id 80C30120003 for ; Mon, 7 Feb 2022 04:48:20 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 79FCD210E8; Mon, 7 Feb 2022 04:48:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209299; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Cfq4C5n0j+9ECeq3Bau2+AONhc0gqMkxIyut8YkPtic=; b=hgCZHV82MSeCfvaBuyVLoaZFgR2bAwnq2EwfvobATSy1y0Acb2GWpRb/98vu5xbNi/VbtX 3O+ymKERrikNj+QdS8RRLvotispWnW0mld72vuhTX30xlfM+Kljy5qrCjzTx4X8DS1pgRm naXBFfVimEaA0fUYKm8uaozUcxwuqno= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209299; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Cfq4C5n0j+9ECeq3Bau2+AONhc0gqMkxIyut8YkPtic=; b=i6Tee3fCM6I1m/ImAiY+h0T6ZFa3vYICUzE299xZFiVQv5P8W7ze9YcOUifBiJHL5woRsG YBkWVL1UV/SIISAg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 7FFA11330E; Mon, 7 Feb 2022 04:48:16 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id RLw6D5CkAGJoNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:48:16 +0000 Subject: [PATCH 13/21] SUNRPC/auth: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916119.29374.16954351742259185040.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 80C30120003 X-Stat-Signature: kd6z6pdbsesr3gb7zk1jkzhcq15bb3ot Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=hgCZHV82; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=i6Tee3fC; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf29.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de X-Rspam-User: nil X-HE-Tag: 1644209300-794529 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. lookup_cred() can block on a mempool or kmalloc - and this can cause deadlocks. So add a new RPCAUTH_LOOKUP flag for async lookups and don't block on memory. If the -ENOMEM gets back to call_refreshresult(), wait a short while and try again. HZ>>4 is chosen as it is used elsewhere for -ENOMEM retries. Signed-off-by: NeilBrown --- include/linux/sunrpc/auth.h | 1 + net/sunrpc/auth.c | 6 +++++- net/sunrpc/auth_gss/auth_gss.c | 6 +++++- net/sunrpc/auth_unix.c | 10 ++++++++-- net/sunrpc/clnt.c | 3 +++ 5 files changed, 22 insertions(+), 4 deletions(-) diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h index 98da816b5fc2..3e6ce288a7fc 100644 --- a/include/linux/sunrpc/auth.h +++ b/include/linux/sunrpc/auth.h @@ -99,6 +99,7 @@ struct rpc_auth_create_args { /* Flags for rpcauth_lookupcred() */ #define RPCAUTH_LOOKUP_NEW 0x01 /* Accept an uninitialised cred */ +#define RPCAUTH_LOOKUP_ASYNC 0x02 /* Don't block waiting for memory */ /* * Client authentication ops diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index a9f0d17fdb0d..6bfa19f9fa6a 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -615,6 +615,8 @@ rpcauth_bind_root_cred(struct rpc_task *task, int lookupflags) }; struct rpc_cred *ret; + if (RPC_IS_ASYNC(task)) + lookupflags |= RPCAUTH_LOOKUP_ASYNC; ret = auth->au_ops->lookup_cred(auth, &acred, lookupflags); put_cred(acred.cred); return ret; @@ -631,6 +633,8 @@ rpcauth_bind_machine_cred(struct rpc_task *task, int lookupflags) if (!acred.principal) return NULL; + if (RPC_IS_ASYNC(task)) + lookupflags |= RPCAUTH_LOOKUP_ASYNC; return auth->au_ops->lookup_cred(auth, &acred, lookupflags); } @@ -654,7 +658,7 @@ rpcauth_bindcred(struct rpc_task *task, const struct cred *cred, int flags) }; if (flags & RPC_TASK_ASYNC) - lookupflags |= RPCAUTH_LOOKUP_NEW; + lookupflags |= RPCAUTH_LOOKUP_NEW | RPCAUTH_LOOKUP_ASYNC; if (task->tk_op_cred) /* Task must use exactly this rpc_cred */ new = get_rpccred(task->tk_op_cred); diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index 5f42aa5fc612..df72d6301f78 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -1341,7 +1341,11 @@ gss_hash_cred(struct auth_cred *acred, unsigned int hashbits) static struct rpc_cred * gss_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) { - return rpcauth_lookup_credcache(auth, acred, flags, GFP_NOFS); + gfp_t gfp = GFP_NOFS; + + if (flags & RPCAUTH_LOOKUP_ASYNC) + gfp = GFP_NOWAIT | __GFP_NOWARN; + return rpcauth_lookup_credcache(auth, acred, flags, gfp); } static struct rpc_cred * diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c index e7df1f782b2e..e5819265dd1b 100644 --- a/net/sunrpc/auth_unix.c +++ b/net/sunrpc/auth_unix.c @@ -43,8 +43,14 @@ unx_destroy(struct rpc_auth *auth) static struct rpc_cred * unx_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) { - struct rpc_cred *ret = mempool_alloc(unix_pool, GFP_NOFS); - + gfp_t gfp = GFP_NOFS; + struct rpc_cred *ret; + + if (flags & RPCAUTH_LOOKUP_ASYNC) + gfp = GFP_NOWAIT | __GFP_NOWARN; + ret = mempool_alloc(unix_pool, gfp); + if (!ret) + return ERR_PTR(-ENOMEM); rpcauth_init_cred(ret, acred, auth, &unix_credops); ret->cr_flags = 1UL << RPCAUTH_CRED_UPTODATE; return ret; diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index c83fe618767c..d1fb7c0c7685 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1745,6 +1745,9 @@ call_refreshresult(struct rpc_task *task) task->tk_cred_retry--; trace_rpc_retry_refresh_status(task); return; + case -ENOMEM: + rpc_delay(task, HZ >> 4); + return; } trace_rpc_refresh_status(task); rpc_call_rpcerror(task, status); From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736831 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 667F8C433F5 for ; Mon, 7 Feb 2022 04:48:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0E596B0081; Sun, 6 Feb 2022 23:48:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EBCDC6B0082; Sun, 6 Feb 2022 23:48:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D85326B0083; Sun, 6 Feb 2022 23:48:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0180.hostedemail.com [216.40.44.180]) by kanga.kvack.org (Postfix) with ESMTP id CA3416B0081 for ; Sun, 6 Feb 2022 23:48:27 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 91CBE95271 for ; Mon, 7 Feb 2022 04:48:27 +0000 (UTC) X-FDA: 79114752654.12.480F0D5 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf11.hostedemail.com (Postfix) with ESMTP id 3A70940005 for ; Mon, 7 Feb 2022 04:48:27 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 353921F37E; Mon, 7 Feb 2022 04:48:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209306; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a+dzRdJ1CEiYYNz26tJO1Hsz32DqZM0fgdDIWGvYfvU=; b=lfv/cyb2G2DNPQTKbRop55g6PPjG1/708V4hQnTQAafD0bZ9UtKAxn2iYQoECGw8O1j85V PMIrabGZ3le/+8BBDqS1s3koU+vRWhWcFT3uueX8MIn2jnE5amCTbNjCpeGlSpnN03buIS CL0t0YYd98rDIgC4MeIwwFYNCzNrrt4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209306; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a+dzRdJ1CEiYYNz26tJO1Hsz32DqZM0fgdDIWGvYfvU=; b=sbcOPan0DVFL0E5MLU5X6ME2JEKy2viqKrzKBOOAt7EgYngT/daz1mGkkQLHl+tcBAOFff pE+zXdtPfmXl/6BA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 244891330E; Mon, 7 Feb 2022 04:48:22 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id sQAQNJakAGJ0NQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:48:22 +0000 Subject: [PATCH 14/21] SUNRPC/xprt: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916120.29374.8047763295599370687.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Stat-Signature: ytdjf7b4qfghysmnkgjj7srfmjfrrtqo X-Rspam-User: nil Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="lfv/cyb2"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=sbcOPan0; spf=pass (imf11.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 3A70940005 X-HE-Tag: 1644209307-452004 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. xprt_dynamic_alloc_slot can block indefinitely. This can tie up all workqueue threads and NFS can deadlock. So when called from a workqueue, set __GFP_NORETRY. The rdma alloc_slot already does not block. However it sets the error to -EAGAIN suggesting this will trigger a sleep. It does not. As we can see in call_reserveresult(), only -ENOMEM causes a sleep. -EAGAIN causes immediate retry. Signed-off-by: NeilBrown --- net/sunrpc/xprt.c | 5 ++++- net/sunrpc/xprtrdma/transport.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a02de2bddb28..47d207e416ab 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1687,12 +1687,15 @@ static bool xprt_throttle_congested(struct rpc_xprt *xprt, struct rpc_task *task static struct rpc_rqst *xprt_dynamic_alloc_slot(struct rpc_xprt *xprt) { struct rpc_rqst *req = ERR_PTR(-EAGAIN); + gfp_t gfp_mask = GFP_NOFS; if (xprt->num_reqs >= xprt->max_reqs) goto out; ++xprt->num_reqs; spin_unlock(&xprt->reserve_lock); - req = kzalloc(sizeof(struct rpc_rqst), GFP_NOFS); + if (current->flags & PF_WQ_WORKER) + gfp_mask |= __GFP_NORETRY | __GFP_NOWARN; + req = kzalloc(sizeof(struct rpc_rqst), gfp_mask); spin_lock(&xprt->reserve_lock); if (req != NULL) goto out; diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 5714bf880e95..923e4b512ee9 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -517,7 +517,7 @@ xprt_rdma_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task) return; out_sleep: - task->tk_status = -EAGAIN; + task->tk_status = -ENOMEM; xprt_add_backlog(xprt, task); } From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736832 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20E38C433F5 for ; Mon, 7 Feb 2022 04:48:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A53F66B0082; Sun, 6 Feb 2022 23:48:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A02906B0083; Sun, 6 Feb 2022 23:48:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8CB7B6B0085; Sun, 6 Feb 2022 23:48:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0195.hostedemail.com [216.40.44.195]) by kanga.kvack.org (Postfix) with ESMTP id 7FAC06B0082 for ; Sun, 6 Feb 2022 23:48:35 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 4461595271 for ; Mon, 7 Feb 2022 04:48:35 +0000 (UTC) X-FDA: 79114752990.19.FD05F26 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf21.hostedemail.com (Postfix) with ESMTP id C7DD41C000A for ; Mon, 7 Feb 2022 04:48:34 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id CCCD8210E8; Mon, 7 Feb 2022 04:48:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209313; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ca/QMA9b8aSP2XqDNpzMoh70WOqFOvfMXBDfQmXLg8g=; b=g7MhZp92/rnAsaBQF/LBN2grTXpM1lSbRGyvPNrVbdb93Hur/KShishK0kboEEwtuBgXEN eV5m3zMF/R1LupwK2CXzQ0u20tEnJ9/jxsjPA8hEJhCJA48lpiq2r6PjIQxxf6AYRjBofP Fhn0FQcRrR9S9O3YbkmJ9Fq76of65ME= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209313; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ca/QMA9b8aSP2XqDNpzMoh70WOqFOvfMXBDfQmXLg8g=; b=S7cvC95vxV7Ud1UDWIFO6Oa2myUbnd6qE+5EbMwEIspBQunKCydhhC2WbDtoFc/abo0+oz jz1Nwrey9mHimyDg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 7A6061330E; Mon, 7 Feb 2022 04:48:30 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id jU5zDZ6kAGJ/NQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:48:30 +0000 Subject: [PATCH 15/21] SUNRPC: remove scheduling boost for "SWAPPER" tasks. From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916121.29374.301108979284279934.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C7DD41C000A X-Stat-Signature: pfw5h1jjj3d41kne4smdgtoxffsukboq X-Rspam-User: nil Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=g7MhZp92; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=S7cvC95v; spf=pass (imf21.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1644209314-241896 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, tasks marked as "swapper" tasks get put to the front of non-priority rpc_queues, and are sorted earlier than non-swapper tasks on the transport's ->xmit_queue. This is pointless as currently *all* tasks for a mount that has swap enabled on *any* file are marked as "swapper" tasks. So the net result is that the non-priority rpc_queues are reverse-ordered (LIFO). This scheduling boost is not necessary to avoid deadlocks, and hurts fairness, so remove it. If there were a need to expedite some requests, the tk_priority mechanism is a more appropriate tool. Signed-off-by: NeilBrown --- net/sunrpc/sched.c | 7 ------- net/sunrpc/xprt.c | 11 ----------- 2 files changed, 18 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index d5b6e897f5a5..256302bf6557 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -186,11 +186,6 @@ static void __rpc_add_wait_queue_priority(struct rpc_wait_queue *queue, /* * Add new request to wait queue. - * - * Swapper tasks always get inserted at the head of the queue. - * This should avoid many nasty memory deadlocks and hopefully - * improve overall performance. - * Everyone else gets appended to the queue to ensure proper FIFO behavior. */ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, struct rpc_task *task, @@ -199,8 +194,6 @@ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, INIT_LIST_HEAD(&task->u.tk_wait.timer_list); if (RPC_IS_PRIORITY(queue)) __rpc_add_wait_queue_priority(queue, task, queue_priority); - else if (RPC_IS_SWAPPER(task)) - list_add(&task->u.tk_wait.list, &queue->tasks[0]); else list_add_tail(&task->u.tk_wait.list, &queue->tasks[0]); task->tk_waitqueue = queue; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 47d207e416ab..a0a2583fe941 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1354,17 +1354,6 @@ xprt_request_enqueue_transmit(struct rpc_task *task) INIT_LIST_HEAD(&req->rq_xmit2); goto out; } - } else if (RPC_IS_SWAPPER(task)) { - list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { - if (pos->rq_cong || pos->rq_bytes_sent) - continue; - if (RPC_IS_SWAPPER(pos->rq_task)) - continue; - /* Note: req is added _before_ pos */ - list_add_tail(&req->rq_xmit, &pos->rq_xmit); - INIT_LIST_HEAD(&req->rq_xmit2); - goto out; - } } else if (!req->rq_seqno) { list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { if (pos->rq_task->tk_owner != task->tk_owner) From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736833 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D94CCC433F5 for ; Mon, 7 Feb 2022 04:48:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 746546B0087; Sun, 6 Feb 2022 23:48:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F3DD6B0085; Sun, 6 Feb 2022 23:48:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 595366B0087; Sun, 6 Feb 2022 23:48:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0234.hostedemail.com [216.40.44.234]) by kanga.kvack.org (Postfix) with ESMTP id 4A21E6B0083 for ; Sun, 6 Feb 2022 23:48:43 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 0D8781813BCE0 for ; Mon, 7 Feb 2022 04:48:43 +0000 (UTC) X-FDA: 79114753326.23.7560A7D Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf03.hostedemail.com (Postfix) with ESMTP id 1A08320004 for ; Mon, 7 Feb 2022 04:48:41 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 2472B1F37E; Mon, 7 Feb 2022 04:48:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209321; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=abp/o94+2rJ4ao5gU5Y3VRhEK9AxXE9gpElUGz0LzVk=; b=CCn6rwGgP1ohSmx2GAqIuYP8+Qj6wqy0px0oekO6styNpRvwCtVHm+LgMww7xKi0dZxr33 F9KVX5dJGAHKxI8KHRFoW/9WEqDbubyDmE97JuSIsq374hFBq9bgFNGw7hHp+kM2IV8b6V v8ETLUb5TzNyIoMihm1YrQCqDarH5TE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209321; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=abp/o94+2rJ4ao5gU5Y3VRhEK9AxXE9gpElUGz0LzVk=; b=zX3qPGqzKSp/jTG3hChxzUyM+Yu2Mr9oyUJcRa20dUWYAXy1z0gdUFudbbOcKK+2H6ZalB KkALKn9wZTiQL3CA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 47EF01330E; Mon, 7 Feb 2022 04:48:37 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id HcYZAaWkAGKHNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:48:37 +0000 Subject: [PATCH 16/21] NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916121.29374.12391432623392264427.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1A08320004 X-Stat-Signature: rhc7jesratqx37una9zu7k9mczjtimb4 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=CCn6rwGg; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=zX3qPGqz; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf03.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de X-Rspam-User: nil X-HE-Tag: 1644209321-999081 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: NFS_RPC_SWAPFLAGS is only used for READ requests. It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to requests. This is not needed for swap READ - though it is for writes where it is set via a different mechanism. RPC_TASK_ROOTCREDS causes the 'machine' credential to be used. This is not needed as the root credential is saved when the swap file is opened, and this is used for all IO. So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of RPC_TASK_ROOTCREDS, that isn't needed either. Remove both. Signed-off-by: NeilBrown --- fs/nfs/read.c | 4 ---- include/linux/nfs_fs.h | 5 ----- include/linux/sunrpc/sched.h | 1 - include/trace/events/sunrpc.h | 1 - net/sunrpc/auth.c | 2 +- 5 files changed, 1 insertion(+), 12 deletions(-) diff --git a/fs/nfs/read.c b/fs/nfs/read.c index eb00229c1a50..cd797ce3a67c 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -194,10 +194,6 @@ static void nfs_initiate_read(struct nfs_pgio_header *hdr, const struct nfs_rpc_ops *rpc_ops, struct rpc_task_setup *task_setup_data, int how) { - struct inode *inode = hdr->inode; - int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0; - - task_setup_data->flags |= swap_flags; rpc_ops->read_setup(hdr, msg); trace_nfs_initiate_read(hdr); } diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 02aa49323d1d..ff8b3820409c 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -45,11 +45,6 @@ */ #define NFS_MAX_TRANSPORTS 16 -/* - * These are the default flags for swap requests - */ -#define NFS_RPC_SWAPFLAGS (RPC_TASK_SWAPPER|RPC_TASK_ROOTCREDS) - /* * Size of the NFS directory verifier */ diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index db964bb63912..56710f8056d3 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -124,7 +124,6 @@ struct rpc_task_setup { #define RPC_TASK_MOVEABLE 0x0004 /* nfs4.1+ rpc tasks */ #define RPC_TASK_NULLCREDS 0x0010 /* Use AUTH_NULL credential */ #define RPC_CALL_MAJORSEEN 0x0020 /* major timeout seen */ -#define RPC_TASK_ROOTCREDS 0x0040 /* force root creds */ #define RPC_TASK_DYNAMIC 0x0080 /* task was kmalloc'ed */ #define RPC_TASK_NO_ROUND_ROBIN 0x0100 /* send requests on "main" xprt */ #define RPC_TASK_SOFT 0x0200 /* Use soft timeouts */ diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 29982d60b68a..ac33892da411 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -311,7 +311,6 @@ TRACE_EVENT(rpc_request, { RPC_TASK_MOVEABLE, "MOVEABLE" }, \ { RPC_TASK_NULLCREDS, "NULLCREDS" }, \ { RPC_CALL_MAJORSEEN, "MAJORSEEN" }, \ - { RPC_TASK_ROOTCREDS, "ROOTCREDS" }, \ { RPC_TASK_DYNAMIC, "DYNAMIC" }, \ { RPC_TASK_NO_ROUND_ROBIN, "NO_ROUND_ROBIN" }, \ { RPC_TASK_SOFT, "SOFT" }, \ diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index 6bfa19f9fa6a..682fcd24bf43 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -670,7 +670,7 @@ rpcauth_bindcred(struct rpc_task *task, const struct cred *cred, int flags) /* If machine cred couldn't be bound, try a root cred */ if (new) ; - else if (cred == &machine_cred || (flags & RPC_TASK_ROOTCREDS)) + else if (cred == &machine_cred) new = rpcauth_bind_root_cred(task, lookupflags); else if (flags & RPC_TASK_NULLCREDS) new = authnull_ops.lookup_cred(NULL, NULL, 0); From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736834 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E68DAC433EF for ; Mon, 7 Feb 2022 04:48:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 87E3D6B0083; Sun, 6 Feb 2022 23:48:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 82CD66B0085; Sun, 6 Feb 2022 23:48:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A84A6B0088; Sun, 6 Feb 2022 23:48:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0207.hostedemail.com [216.40.44.207]) by kanga.kvack.org (Postfix) with ESMTP id 5B1546B0083 for ; Sun, 6 Feb 2022 23:48:49 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 1A48495271 for ; Mon, 7 Feb 2022 04:48:49 +0000 (UTC) X-FDA: 79114753578.25.A1E4322 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf08.hostedemail.com (Postfix) with ESMTP id 95AC9160002 for ; Mon, 7 Feb 2022 04:48:48 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9A4A7210E8; Mon, 7 Feb 2022 04:48:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209327; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NiDP9RcSA5ZHYXdBK498Y6AaH3rCTlt+B3aX1Ji9K8c=; b=yFQcY7p2U+yMBieYQ9Ap5kiuPCxStsoseCRC/zR9E8aD0xHJbjesBUcCSH2xeK28Xr+qra dBFVvzDjLz+SJR7Iz4DHuuW4NfNXt/7NY9lNVgM8wxG8miIeReC8BtVti8+5AsmJdOPQjI qhhGKn3Dig0Sw1MKlQQ6CBz1NZmWp3I= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209327; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NiDP9RcSA5ZHYXdBK498Y6AaH3rCTlt+B3aX1Ji9K8c=; b=aKDca+HVCXocFEMyCBbTi5HaOEdAbBSDpju6By0CvJgdLM20DtD9hw8g8NZToqQcESJ+Hg g/+Hg4IWfhZVLSAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9D0F91330E; Mon, 7 Feb 2022 04:48:44 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id WBpCFqykAGKKNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:48:44 +0000 Subject: [PATCH 17/21] SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916122.29374.8760841832230503670.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 95AC9160002 X-Stat-Signature: mn6nbpqfa5eqizufy4mkaj5kc8um368c Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=yFQcY7p2; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=aKDca+HV; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf08.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de X-Rspam-User: nil X-HE-Tag: 1644209328-578 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: rpc tasks can be marked as RPC_TASK_SWAPPER. This causes GFP_MEMALLOC to be used for some allocations. This is needed in some cases, but not in all where it is currently provided, and in some where it isn't provided. Currently *all* tasks associated with a rpc_client on which swap is enabled get the flag and hence some GFP_MEMALLOC support. GFP_MEMALLOC is provided for ->buf_alloc() but only swap-writes need it. However xdr_alloc_bvec does not get GFP_MEMALLOC - though it often does need it. xdr_alloc_bvec is called while the XPRT_LOCK is held. If this blocks, then it blocks all other queued tasks. So this allocation needs GFP_MEMALLOC for *all* requests, not just writes, when the xprt is used for any swap writes. Similarly, if the transport is not connected, that will block all requests including swap writes, so memory allocations should get GFP_MEMALLOC if swap writes are possible. So with this patch: 1/ we ONLY set RPC_TASK_SWAPPER for swap writes. 2/ __rpc_execute() sets PF_MEMALLOC while handling any task with RPC_TASK_SWAPPER set, or when handling any task that holds the XPRT_LOCKED lock on an xprt used for swap. This removes the need for the RPC_IS_SWAPPER() test in ->buf_alloc handlers. 3/ xprt_prepare_transmit() sets PF_MEMALLOC after locking any task to a swapper xprt. __rpc_execute() will clear it. 3/ PF_MEMALLOC is set for all the connect workers. Signed-off-by: NeilBrown Reviewed-by: Chuck Lever --- fs/nfs/write.c | 2 ++ net/sunrpc/clnt.c | 2 -- net/sunrpc/sched.c | 20 +++++++++++++++++--- net/sunrpc/xprt.c | 3 +++ net/sunrpc/xprtrdma/transport.c | 6 ++++-- net/sunrpc/xprtsock.c | 8 ++++++++ 6 files changed, 34 insertions(+), 7 deletions(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 987a187bd39a..9f7176745fef 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1409,6 +1409,8 @@ static void nfs_initiate_write(struct nfs_pgio_header *hdr, { int priority = flush_task_priority(how); + if (IS_SWAPFILE(hdr->inode)) + task_setup_data->flags |= RPC_TASK_SWAPPER; task_setup_data->priority = priority; rpc_ops->write_setup(hdr, msg, &task_setup_data->rpc_client); trace_nfs_initiate_write(hdr); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index d1fb7c0c7685..842366a2fc57 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1085,8 +1085,6 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt) task->tk_flags |= RPC_TASK_TIMEOUT; if (clnt->cl_noretranstimeo) task->tk_flags |= RPC_TASK_NO_RETRANS_TIMEOUT; - if (atomic_read(&clnt->cl_swapper)) - task->tk_flags |= RPC_TASK_SWAPPER; /* Add to the client's list of all tasks */ spin_lock(&clnt->cl_lock); list_add_tail(&task->tk_task, &clnt->cl_tasks); diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 256302bf6557..9020cedb7c95 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -869,6 +869,15 @@ void rpc_release_calldata(const struct rpc_call_ops *ops, void *calldata) ops->rpc_release(calldata); } +static bool xprt_needs_memalloc(struct rpc_xprt *xprt, struct rpc_task *tk) +{ + if (!xprt) + return false; + if (!atomic_read(&xprt->swapper)) + return false; + return test_bit(XPRT_LOCKED, &xprt->state) && xprt->snd_task == tk; +} + /* * This is the RPC `scheduler' (or rather, the finite state machine). */ @@ -877,6 +886,7 @@ static void __rpc_execute(struct rpc_task *task) struct rpc_wait_queue *queue; int task_is_async = RPC_IS_ASYNC(task); int status = 0; + unsigned long pflags = current->flags; WARN_ON_ONCE(RPC_IS_QUEUED(task)); if (RPC_IS_QUEUED(task)) @@ -899,6 +909,10 @@ static void __rpc_execute(struct rpc_task *task) } if (!do_action) break; + if (RPC_IS_SWAPPER(task) || + xprt_needs_memalloc(task->tk_xprt, task)) + current->flags |= PF_MEMALLOC; + trace_rpc_task_run_action(task, do_action); do_action(task); @@ -936,7 +950,7 @@ static void __rpc_execute(struct rpc_task *task) rpc_clear_running(task); spin_unlock(&queue->lock); if (task_is_async) - return; + goto out; /* sync task: sleep here */ trace_rpc_task_sync_sleep(task, task->tk_action); @@ -960,6 +974,8 @@ static void __rpc_execute(struct rpc_task *task) /* Release all resources associated with the task */ rpc_release_task(task); +out: + current_restore_flags(pflags, PF_MEMALLOC); } /* @@ -1018,8 +1034,6 @@ int rpc_malloc(struct rpc_task *task) if (RPC_IS_ASYNC(task)) gfp = GFP_NOWAIT | __GFP_NOWARN; - if (RPC_IS_SWAPPER(task)) - gfp |= __GFP_MEMALLOC; size += sizeof(struct rpc_buffer); if (size <= RPC_BUFFER_MAXSIZE) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a0a2583fe941..0614e7463d4b 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1492,6 +1492,9 @@ bool xprt_prepare_transmit(struct rpc_task *task) return false; } + if (atomic_read(&xprt->swapper)) + /* This will be clear in __rpc_execute */ + current->flags |= PF_MEMALLOC; return true; } diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 923e4b512ee9..6b7e10e5a141 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -235,8 +235,11 @@ xprt_rdma_connect_worker(struct work_struct *work) struct rpcrdma_xprt *r_xprt = container_of(work, struct rpcrdma_xprt, rx_connect_worker.work); struct rpc_xprt *xprt = &r_xprt->rx_xprt; + unsigned int pflags = current->flags; int rc; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; rc = rpcrdma_xprt_connect(r_xprt); xprt_clear_connecting(xprt); if (!rc) { @@ -250,6 +253,7 @@ xprt_rdma_connect_worker(struct work_struct *work) rpcrdma_xprt_disconnect(r_xprt); xprt_unlock_connect(xprt, r_xprt); xprt_wake_pending_tasks(xprt, rc); + current_restore_flags(pflags, PF_MEMALLOC); } /** @@ -572,8 +576,6 @@ xprt_rdma_allocate(struct rpc_task *task) flags = RPCRDMA_DEF_GFP; if (RPC_IS_ASYNC(task)) flags = GFP_NOWAIT | __GFP_NOWARN; - if (RPC_IS_SWAPPER(task)) - flags |= __GFP_MEMALLOC; if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize, flags)) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 69b6ee5a5fd1..c461a0ce9531 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2047,7 +2047,10 @@ static void xs_udp_setup_socket(struct work_struct *work) struct rpc_xprt *xprt = &transport->xprt; struct socket *sock; int status = -EIO; + unsigned int pflags = current->flags; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family, SOCK_DGRAM, IPPROTO_UDP, false); @@ -2067,6 +2070,7 @@ static void xs_udp_setup_socket(struct work_struct *work) xprt_clear_connecting(xprt); xprt_unlock_connect(xprt, transport); xprt_wake_pending_tasks(xprt, status); + current_restore_flags(pflags, PF_MEMALLOC); } /** @@ -2226,7 +2230,10 @@ static void xs_tcp_setup_socket(struct work_struct *work) struct socket *sock = transport->sock; struct rpc_xprt *xprt = &transport->xprt; int status; + unsigned int pflags = current->flags; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; if (!sock) { sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family, SOCK_STREAM, @@ -2291,6 +2298,7 @@ static void xs_tcp_setup_socket(struct work_struct *work) xprt_clear_connecting(xprt); out_unlock: xprt_unlock_connect(xprt, transport); + current_restore_flags(pflags, PF_MEMALLOC); } /** From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736835 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8054C433EF for ; Mon, 7 Feb 2022 04:48:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 45E676B007D; Sun, 6 Feb 2022 23:48:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 40E0C6B007E; Sun, 6 Feb 2022 23:48:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 287FB6B0085; Sun, 6 Feb 2022 23:48:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0051.hostedemail.com [216.40.44.51]) by kanga.kvack.org (Postfix) with ESMTP id 1A63C6B007D for ; Sun, 6 Feb 2022 23:48:56 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DB280181200C0 for ; Mon, 7 Feb 2022 04:48:55 +0000 (UTC) X-FDA: 79114753830.20.3C5171A Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf29.hostedemail.com (Postfix) with ESMTP id 5E488120004 for ; Mon, 7 Feb 2022 04:48:55 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 65A351F37E; Mon, 7 Feb 2022 04:48:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209334; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GWtpS01SlD8ZVUrxVP9gFCCl2UNqlrRmBNEVUmeRtmo=; b=kNIWdVn1JtQxtbk8/Nx6GLgboyR8KAmP+K5JduuWBPGclcqClVmtLpJHItE3omianjO5z0 SNC5hZolH+T0LtFtHRunNrPdXTu+PCGez5JWpaaddP63DdRBmZfM9Ae6yzgDoLDMK7T+NJ GNWeX1+vPCr0azmCFmtAIUAYS/fy9Z4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209334; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GWtpS01SlD8ZVUrxVP9gFCCl2UNqlrRmBNEVUmeRtmo=; b=ffx2tefwVFLCky+w3WrFtjZq7gyasAyO/zejnRvdHSVlZDEVIZOmdA8EDAYGU7iLhs5gIn pfPeko0mKo156ECg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 7FE1A1330E; Mon, 7 Feb 2022 04:48:51 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id HAu/D7OkAGKSNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:48:51 +0000 Subject: [PATCH 18/21] NFSv4: keep state manager thread active if swap is enabled From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916123.29374.17511219549445805575.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5E488120004 X-Stat-Signature: uc5hgtp9xd3jxsdtohy4oce9bzt3kwqh Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=kNIWdVn1; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=ffx2tefw; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf29.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de X-Rspam-User: nil X-HE-Tag: 1644209335-88070 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If we are swapping over NFSv4, we may not be able to allocate memory to start the state-manager thread at the time when we need it. So keep it always running when swap is enabled, and just signal it to start. This requires updating and testing the cl_swapper count on the root rpc_clnt after following all ->cl_parent links. Signed-off-by: NeilBrown --- fs/nfs/file.c | 15 ++++++++++++--- fs/nfs/nfs4_fs.h | 1 + fs/nfs/nfs4proc.c | 20 ++++++++++++++++++++ fs/nfs/nfs4state.c | 39 +++++++++++++++++++++++++++++++++------ include/linux/nfs_xdr.h | 2 ++ net/sunrpc/clnt.c | 2 ++ 6 files changed, 70 insertions(+), 9 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 4d4750738aeb..81fe996c6272 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -486,8 +486,9 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, unsigned long blocks; long long isize; int ret; - struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); - struct inode *inode = file->f_mapping->host; + struct inode *inode = file_inode(file); + struct rpc_clnt *clnt = NFS_CLIENT(inode); + struct nfs_client *cl = NFS_SERVER(inode)->nfs_client; if (!file->f_mapping->a_ops->swap_rw) /* Cannot support swap */ @@ -512,14 +513,22 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, } *span = sis->pages; sis->flags |= SWP_FS_OPS; + + if (cl->rpc_ops->enable_swap) + cl->rpc_ops->enable_swap(inode); + return ret; } static void nfs_swap_deactivate(struct file *file) { - struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); + struct inode *inode = file_inode(file); + struct rpc_clnt *clnt = NFS_CLIENT(inode); + struct nfs_client *cl = NFS_SERVER(inode)->nfs_client; rpc_clnt_swap_deactivate(clnt); + if (cl->rpc_ops->disable_swap) + cl->rpc_ops->disable_swap(file_inode(file)); } const struct address_space_operations nfs_file_aops = { diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index 84f39b6f1b1e..79df6e83881b 100644 --- a/fs/nfs/nfs4_fs.h +++ b/fs/nfs/nfs4_fs.h @@ -42,6 +42,7 @@ enum nfs4_client_state { NFS4CLNT_LEASE_MOVED, NFS4CLNT_DELEGATION_EXPIRED, NFS4CLNT_RUN_MANAGER, + NFS4CLNT_MANAGER_AVAILABLE, NFS4CLNT_RECALL_RUNNING, NFS4CLNT_RECALL_ANY_LAYOUT_READ, NFS4CLNT_RECALL_ANY_LAYOUT_RW, diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index b18f31b2c9e7..d3549f48b012 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -10461,6 +10461,24 @@ static ssize_t nfs4_listxattr(struct dentry *dentry, char *list, size_t size) return error + error2 + error3; } +static void nfs4_enable_swap(struct inode *inode) +{ + /* The state manager thread must always be running. + * It will notice the client is a swapper, and stay put. + */ + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client; + + nfs4_schedule_state_manager(clp); +} + +static void nfs4_disable_swap(struct inode *inode) +{ + /* The state manager thread will now exit once it is + * woken. + */ + wake_up_var(&NFS_SERVER(inode)->nfs_client->cl_state); +} + static const struct inode_operations nfs4_dir_inode_operations = { .create = nfs_create, .lookup = nfs_lookup, @@ -10538,6 +10556,8 @@ const struct nfs_rpc_ops nfs_v4_clientops = { .create_server = nfs4_create_server, .clone_server = nfs_clone_server, .discover_trunking = nfs4_discover_trunking, + .enable_swap = nfs4_enable_swap, + .disable_swap = nfs4_disable_swap, }; static const struct xattr_handler nfs4_xattr_nfs4_acl_handler = { diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index f5a62c0d999b..5dc52eefaffb 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1205,10 +1205,17 @@ void nfs4_schedule_state_manager(struct nfs_client *clp) { struct task_struct *task; char buf[INET6_ADDRSTRLEN + sizeof("-manager") + 1]; + struct rpc_clnt *cl = clp->cl_rpcclient; + + while (cl != cl->cl_parent) + cl = cl->cl_parent; set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state); - if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0) + if (test_and_set_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state) != 0) { + wake_up_var(&clp->cl_state); return; + } + set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state); __module_get(THIS_MODULE); refcount_inc(&clp->cl_count); @@ -1224,6 +1231,7 @@ void nfs4_schedule_state_manager(struct nfs_client *clp) printk(KERN_ERR "%s: kthread_run: %ld\n", __func__, PTR_ERR(task)); nfs4_clear_state_manager_bit(clp); + clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state); nfs_put_client(clp); module_put(THIS_MODULE); } @@ -2669,11 +2677,8 @@ static void nfs4_state_manager(struct nfs_client *clp) clear_bit(NFS4CLNT_RECALL_RUNNING, &clp->cl_state); } - /* Did we race with an attempt to give us more work? */ - if (!test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state)) - return; - if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0) - return; + return; + } while (refcount_read(&clp->cl_count) > 1 && !signalled()); goto out_drain; @@ -2693,9 +2698,31 @@ static void nfs4_state_manager(struct nfs_client *clp) static int nfs4_run_state_manager(void *ptr) { struct nfs_client *clp = ptr; + struct rpc_clnt *cl = clp->cl_rpcclient; + + while (cl != cl->cl_parent) + cl = cl->cl_parent; allow_signal(SIGKILL); +again: + set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state); nfs4_state_manager(clp); + if (atomic_read(&cl->cl_swapper)) { + wait_var_event_interruptible(&clp->cl_state, + test_bit(NFS4CLNT_RUN_MANAGER, + &clp->cl_state)); + if (atomic_read(&cl->cl_swapper) && + test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state)) + goto again; + /* Either no longer a swapper, or were signalled */ + } + clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state); + + if (refcount_read(&clp->cl_count) > 1 && !signalled() && + test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) && + !test_and_set_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state)) + goto again; + nfs_put_client(clp); module_put_and_kthread_exit(0); return 0; diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 728cb0c1f0b6..6861ac8af808 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1798,6 +1798,8 @@ struct nfs_rpc_ops { struct nfs_server *(*clone_server)(struct nfs_server *, struct nfs_fh *, struct nfs_fattr *, rpc_authflavor_t); int (*discover_trunking)(struct nfs_server *, struct nfs_fh *); + void (*enable_swap)(struct inode *inode); + void (*disable_swap)(struct inode *inode); }; /* diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 842366a2fc57..04ccf6a06ca7 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -3069,6 +3069,8 @@ rpc_clnt_swap_activate_callback(struct rpc_clnt *clnt, int rpc_clnt_swap_activate(struct rpc_clnt *clnt) { + while (clnt != clnt->cl_parent) + clnt = clnt->cl_parent; if (atomic_inc_return(&clnt->cl_swapper) == 1) return rpc_clnt_iterate_for_each_xprt(clnt, rpc_clnt_swap_activate_callback, NULL); From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736836 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58D01C433EF for ; Mon, 7 Feb 2022 04:49:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D87906B0072; Sun, 6 Feb 2022 23:49:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D35C56B007E; Sun, 6 Feb 2022 23:49:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD7466B0080; Sun, 6 Feb 2022 23:49:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id AF40F6B0072 for ; Sun, 6 Feb 2022 23:49:02 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 89C0980243 for ; Mon, 7 Feb 2022 04:49:02 +0000 (UTC) X-FDA: 79114754124.03.A6A70CF Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf14.hostedemail.com (Postfix) with ESMTP id 10C29100006 for ; Mon, 7 Feb 2022 04:49:01 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 136B61F37E; Mon, 7 Feb 2022 04:49:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209341; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fwJWsWKAHR1izFzYRPVHxF5KD02ioN/att4BQUXt4ok=; b=sz1wvonUJnZOIzGF+QuZIo3wNsZGbbukf6KwBl8JLi/mdJzgN/XeIp0TF7n8p/LI9esb0m EY0JicYakn6JS/QO7pzsdsGFM/oCJlwBltgIhvZUiPS3RZxLEy4KFVi1LKxYoP3wYcROhf KhBa9TRRcd11eO9EB7rpX2jA40FKh3s= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209341; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fwJWsWKAHR1izFzYRPVHxF5KD02ioN/att4BQUXt4ok=; b=IbYQw1WEfmd3EdeiUL3S5rCqI3og24Anje/7B22P1GrYriZCFWzQrYiQlyfnM17v053Lbd yARR1Vb5UxkDklBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 31A8A1330E; Mon, 7 Feb 2022 04:48:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id WPFKOLmkAGKUNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:48:57 +0000 Subject: [PATCH 19/21] NFS: rename nfs_direct_IO and use as ->swap_rw From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916124.29374.1697475326428179041.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Queue-Id: 10C29100006 X-Rspam-User: nil Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=sz1wvonU; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=IbYQw1WE; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf14.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de X-Stat-Signature: omjdr5dcybuwx3fy1tkxj64yfiqxbd76 X-Rspamd-Server: rspam08 X-HE-Tag: 1644209341-411592 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The nfs_direct_IO() exists to support SWAP IO, but hasn't worked for a while. We now need a ->swap_rw function which behaves slightly differently, returning zero for success rather than a byte count. So modify nfs_direct_IO accordingly, rename it, and use it as the ->swap_rw function. Note: it still won't work - that will be fixed in later patches. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- fs/nfs/direct.c | 23 ++++++++++------------- fs/nfs/file.c | 5 +---- include/linux/nfs_fs.h | 2 +- 3 files changed, 12 insertions(+), 18 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index eabfdab543c8..b929dd5b0c3a 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -153,28 +153,25 @@ nfs_direct_count_bytes(struct nfs_direct_req *dreq, } /** - * nfs_direct_IO - NFS address space operation for direct I/O + * nfs_swap_rw - NFS address space operation for swap I/O * @iocb: target I/O control block * @iter: I/O buffer * - * The presence of this routine in the address space ops vector means - * the NFS client supports direct I/O. However, for most direct IO, we - * shunt off direct read and write requests before the VFS gets them, - * so this method is only ever called for swap. + * Perform IO to the swap-file. This is much like direct IO. */ -ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) +int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter) { - struct inode *inode = iocb->ki_filp->f_mapping->host; - - /* we only support swap file calling nfs_direct_IO */ - if (!IS_SWAPFILE(inode)) - return 0; + ssize_t ret; VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE); if (iov_iter_rw(iter) == READ) - return nfs_file_direct_read(iocb, iter); - return nfs_file_direct_write(iocb, iter); + ret = nfs_file_direct_read(iocb, iter); + else + ret = nfs_file_direct_write(iocb, iter); + if (ret < 0) + return ret; + return 0; } static void nfs_direct_release_pages(struct page **pages, unsigned int npages) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 81fe996c6272..7d42117b210d 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -490,10 +490,6 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, struct rpc_clnt *clnt = NFS_CLIENT(inode); struct nfs_client *cl = NFS_SERVER(inode)->nfs_client; - if (!file->f_mapping->a_ops->swap_rw) - /* Cannot support swap */ - return -EINVAL; - spin_lock(&inode->i_lock); blocks = inode->i_blocks; isize = inode->i_size; @@ -549,6 +545,7 @@ const struct address_space_operations nfs_file_aops = { .error_remove_page = generic_error_remove_page, .swap_activate = nfs_swap_activate, .swap_deactivate = nfs_swap_deactivate, + .swap_rw = nfs_swap_rw, }; /* diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index ff8b3820409c..58807406aff6 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -506,7 +506,7 @@ static inline const struct cred *nfs_file_cred(struct file *file) /* * linux/fs/nfs/direct.c */ -extern ssize_t nfs_direct_IO(struct kiocb *, struct iov_iter *); +int nfs_swap_rw(struct kiocb *, struct iov_iter *); extern ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter); extern ssize_t nfs_file_direct_write(struct kiocb *iocb, From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736837 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07446C433F5 for ; Mon, 7 Feb 2022 04:49:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F2486B0073; Sun, 6 Feb 2022 23:49:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8A5D16B007E; Sun, 6 Feb 2022 23:49:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7749B6B0073; Sun, 6 Feb 2022 23:49:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173]) by kanga.kvack.org (Postfix) with ESMTP id 68E036B0073 for ; Sun, 6 Feb 2022 23:49:09 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2AB5C8249980 for ; Mon, 7 Feb 2022 04:49:09 +0000 (UTC) X-FDA: 79114754418.08.EF5A5F9 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf26.hostedemail.com (Postfix) with ESMTP id 8C108140002 for ; Mon, 7 Feb 2022 04:49:08 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 8AA5A1F37E; Mon, 7 Feb 2022 04:49:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209347; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E23xUbeV6J41TMiR7CnKzUTjVG3PkIHbIUyHUCQmcCM=; b=DQV/jOLzgYJmX7dA9MU+4JDWUtKP8wz/wJe6qbSqgmBqKXJ6eVZAKFYVip7GuG3sg3fBnQ wt0TvsfTYlMLV3dbF5dMYc5ZEX9gzSfjTgeezWXv4hUdUHsL3igXaq8oMjAR1ciTFq16Ub rpL6oOaWIQv/kuuLWh8qWX1+NkAVKl4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209347; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E23xUbeV6J41TMiR7CnKzUTjVG3PkIHbIUyHUCQmcCM=; b=m4KQ4C12YdWdLAG1uGn6XelqJS2eHedV7cweGciC7jVETHpTpXYPuIhUbod3bEBp5S2QN9 dDhBtSiePirgGlBA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A523B1330E; Mon, 7 Feb 2022 04:49:04 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id TKTZGMCkAGKaNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:49:04 +0000 Subject: [PATCH 20/21] NFS: swap IO handling is slightly different for O_DIRECT IO From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916125.29374.6888726725038584805.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 8C108140002 X-Stat-Signature: 1uq4txx4815wwusgwymyk7xrqewb3o3d X-Rspam-User: nil Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="DQV/jOLz"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=m4KQ4C12; spf=pass (imf26.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1644209348-13051 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted. We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw() 2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw(). Signed-off-by: NeilBrown --- fs/nfs/direct.c | 42 ++++++++++++++++++++++++++++-------------- fs/nfs/file.c | 4 ++-- include/linux/nfs_fs.h | 8 ++++---- 3 files changed, 34 insertions(+), 20 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index b929dd5b0c3a..c5c53219beeb 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -166,9 +166,9 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter) VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE); if (iov_iter_rw(iter) == READ) - ret = nfs_file_direct_read(iocb, iter); + ret = nfs_file_direct_read(iocb, iter, true); else - ret = nfs_file_direct_write(iocb, iter); + ret = nfs_file_direct_write(iocb, iter, true); if (ret < 0) return ret; return 0; @@ -422,6 +422,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, * nfs_file_direct_read - file direct read operation for NFS files * @iocb: target I/O control block * @iter: vector of user buffers into which to read data + * @swap: flag indicating this is swap IO, not O_DIRECT IO * * We use this function for direct reads instead of calling * generic_file_aio_read() in order to avoid gfar's check to see if @@ -437,7 +438,8 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, * client must read the updated atime from the server back into its * cache. */ -ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) +ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter, + bool swap) { struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; @@ -479,12 +481,14 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) if (iter_is_iovec(iter)) dreq->flags = NFS_ODIRECT_SHOULD_DIRTY; - nfs_start_io_direct(inode); + if (!swap) + nfs_start_io_direct(inode); NFS_I(inode)->read_io += count; requested = nfs_direct_read_schedule_iovec(dreq, iter, iocb->ki_pos); - nfs_end_io_direct(inode); + if (!swap) + nfs_end_io_direct(inode); if (requested > 0) { result = nfs_direct_wait(dreq); @@ -873,6 +877,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * nfs_file_direct_write - file direct write operation for NFS files * @iocb: target I/O control block * @iter: vector of user buffers from which to write data + * @swap: flag indicating this is swap IO, not O_DIRECT IO * * We use this function for direct writes instead of calling * generic_file_aio_write() in order to avoid taking the inode @@ -889,7 +894,8 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * Note that O_APPEND is not supported for NFS direct writes, as there * is no atomic O_APPEND write facility in the NFS protocol. */ -ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) +ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, + bool swap) { ssize_t result, requested; size_t count; @@ -903,7 +909,11 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n", file, iov_iter_count(iter), (long long) iocb->ki_pos); - result = generic_write_checks(iocb, iter); + if (swap) + /* bypass generic checks */ + result = iov_iter_count(iter); + else + result = generic_write_checks(iocb, iter); if (result <= 0) return result; count = result; @@ -934,16 +944,20 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dreq->iocb = iocb; pnfs_init_ds_commit_info_ops(&dreq->ds_cinfo, inode); - nfs_start_io_direct(inode); + if (swap) { + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + } else { + nfs_start_io_direct(inode); - requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); - if (mapping->nrpages) { - invalidate_inode_pages2_range(mapping, - pos >> PAGE_SHIFT, end); - } + if (mapping->nrpages) { + invalidate_inode_pages2_range(mapping, + pos >> PAGE_SHIFT, end); + } - nfs_end_io_direct(inode); + nfs_end_io_direct(inode); + } if (requested > 0) { result = nfs_direct_wait(dreq); diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 7d42117b210d..ceacae8e7a38 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -159,7 +159,7 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to) ssize_t result; if (iocb->ki_flags & IOCB_DIRECT) - return nfs_file_direct_read(iocb, to); + return nfs_file_direct_read(iocb, to, false); dprintk("NFS: read(%pD2, %zu@%lu)\n", iocb->ki_filp, @@ -634,7 +634,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) return result; if (iocb->ki_flags & IOCB_DIRECT) - return nfs_file_direct_write(iocb, from); + return nfs_file_direct_write(iocb, from, false); dprintk("NFS: write(%pD2, %zu@%Ld)\n", file, iov_iter_count(from), (long long) iocb->ki_pos); diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 58807406aff6..22aa5c08e3ed 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -507,10 +507,10 @@ static inline const struct cred *nfs_file_cred(struct file *file) * linux/fs/nfs/direct.c */ int nfs_swap_rw(struct kiocb *, struct iov_iter *); -extern ssize_t nfs_file_direct_read(struct kiocb *iocb, - struct iov_iter *iter); -extern ssize_t nfs_file_direct_write(struct kiocb *iocb, - struct iov_iter *iter); +ssize_t nfs_file_direct_read(struct kiocb *iocb, + struct iov_iter *iter, bool swap); +ssize_t nfs_file_direct_write(struct kiocb *iocb, + struct iov_iter *iter, bool swap); /* * linux/fs/nfs/dir.c From patchwork Mon Feb 7 04:46:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12736838 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51571C433EF for ; Mon, 7 Feb 2022 04:49:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD3D86B0074; Sun, 6 Feb 2022 23:49:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D839A6B007B; Sun, 6 Feb 2022 23:49:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C72296B007E; Sun, 6 Feb 2022 23:49:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0069.hostedemail.com [216.40.44.69]) by kanga.kvack.org (Postfix) with ESMTP id B771B6B0074 for ; Sun, 6 Feb 2022 23:49:15 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 7EC9A96345 for ; Mon, 7 Feb 2022 04:49:15 +0000 (UTC) X-FDA: 79114754670.14.32F2CEC Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf19.hostedemail.com (Postfix) with ESMTP id 023F01A0003 for ; Mon, 7 Feb 2022 04:49:14 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id F1FD3210E8; Mon, 7 Feb 2022 04:49:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644209353; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8RjmMA1NrNB1dcX8VlrCIeURT/zjbUyKgTSkkvTTrZ4=; b=eH4GEA19qQ07ImRUOY2A8Sm2l4231phSU/9mIfpFmNGSBtN/OXjXOLrrPmN7/7gZBO9LOU znkMvj6r5hd7Bp8KTHQm/KjMzjhkUiPV9Q3+bx8G1KBxzb65nFaj4VRcI6tZcCQtbMR76J 8TPio5UkYLoXB+d/+vSet5L5w5cH1FU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644209353; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8RjmMA1NrNB1dcX8VlrCIeURT/zjbUyKgTSkkvTTrZ4=; b=gg+Xt3GS+NULgkfMEQ5/eDYBs276HjW/e20Y9e5NxQtC0tLeRLB/8gCLHZiJqdYdH5eMew OACLY5ZojWUu/3Bw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id EA35D1330E; Mon, 7 Feb 2022 04:49:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id PnInKcakAGKgNQAAMHmgww (envelope-from ); Mon, 07 Feb 2022 04:49:10 +0000 Subject: [PATCH 21/21] NFS: swap-out must always use STABLE writes. From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mark Hemment , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 07 Feb 2022 15:46:01 +1100 Message-ID: <164420916125.29374.15563574398247024921.stgit@noble.brown> In-Reply-To: <164420889455.29374.17958998143835612560.stgit@noble.brown> References: <164420889455.29374.17958998143835612560.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 023F01A0003 X-Rspam-User: nil Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=eH4GEA19; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=gg+Xt3GS; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf19.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de X-Stat-Signature: 3o3pbrjebbps74xxcknofiot53f91bxf X-HE-Tag: 1644209354-804077 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The commit handling code is not safe against memory-pressure deadlocks when writing to swap. In particular, nfs_commitdata_alloc() blocks indefinitely waiting for memory, and this can consume all available workqueue threads. swap-out most likely uses STABLE writes anyway as COND_STABLE indicates that a stable write should be used if the write fits in a single request, and it normally does. However if we ever swap with a small wsize, or gather unusually large numbers of pages for a single write, this might change. For safety, make it explicit in the code that direct writes used for swap must always use FLUSH_STABLE. Signed-off-by: NeilBrown --- fs/nfs/direct.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index c5c53219beeb..4eb2a8380a28 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -791,7 +791,7 @@ static const struct nfs_pgio_completion_ops nfs_direct_write_completion_ops = { */ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, struct iov_iter *iter, - loff_t pos) + loff_t pos, int ioflags) { struct nfs_pageio_descriptor desc; struct inode *inode = dreq->inode; @@ -799,7 +799,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, size_t requested_bytes = 0; size_t wsize = max_t(size_t, NFS_SERVER(inode)->wsize, PAGE_SIZE); - nfs_pageio_init_write(&desc, inode, FLUSH_COND_STABLE, false, + nfs_pageio_init_write(&desc, inode, ioflags, false, &nfs_direct_write_completion_ops); desc.pg_dreq = dreq; get_dreq(dreq); @@ -945,11 +945,13 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, pnfs_init_ds_commit_info_ops(&dreq->ds_cinfo, inode); if (swap) { - requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos, + FLUSH_STABLE); } else { nfs_start_io_direct(inode); - requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos, + FLUSH_COND_STABLE); if (mapping->nrpages) { invalidate_inode_pages2_range(mapping,