From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721478 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 396A9C4332F for ; Mon, 24 Jan 2022 03:50:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241147AbiAXDuS (ORCPT ); Sun, 23 Jan 2022 22:50:18 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:46852 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241140AbiAXDuQ (ORCPT ); Sun, 23 Jan 2022 22:50:16 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 9A8A31F3A5; Mon, 24 Jan 2022 03:50:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996215; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zqNHdmgKRj+6NKQTNucmcQ/3yI6GnNFeF8QV0taIvqU=; b=JWNUJFsKYm3y5wsYoWIOI191su4PHTfIeTljY7tdbGq/cqCJeCM5xlA9SuGRIMas6GhC61 f7UiGuD9R3mBflyp6l8ODn0wG3w11rhbq4VXI96Uj81Tz7q73QlCVOIktDS+EKzMht30F4 OBrqPNhdOY28lq5n9EfLw4pIDmb635g= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996215; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zqNHdmgKRj+6NKQTNucmcQ/3yI6GnNFeF8QV0taIvqU=; b=to7PTTUW9APsTFZOAQ1L3r555p1F1/j5igLiZ2O77Mqv2/c7HKCXg3y+WH8n84Jw2NSweR u92sqR6mFMr7vZAQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 5F19E1331A; Mon, 24 Jan 2022 03:50:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id /WtOB/Qh7mGWRAAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:50:12 +0000 Subject: [PATCH 01/23] MM: create new mm/swap.h header file. From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611271.26253.2968456569309914722.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Many functions declared in include/linux/swap.h are only used within mm/ Create a new "mm/swap.h" and move some of these declarations there. Remove the redundant 'extern' from the function declarations. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- include/linux/swap.h | 121 ----------------------------------------------- mm/madvise.c | 1 mm/memcontrol.c | 1 mm/memory.c | 1 mm/mincore.c | 1 mm/page_alloc.c | 1 mm/page_io.c | 1 mm/shmem.c | 1 mm/swap.h | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++ mm/swap_state.c | 1 mm/swapfile.c | 1 mm/util.c | 1 mm/vmscan.c | 1 13 files changed, 140 insertions(+), 121 deletions(-) create mode 100644 mm/swap.h diff --git a/include/linux/swap.h b/include/linux/swap.h index 1d38d9475c4d..3f54a8941c9d 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -419,62 +419,19 @@ extern void kswapd_stop(int nid); #ifdef CONFIG_SWAP -#include /* for bio_end_io_t */ - -/* linux/mm/page_io.c */ -extern int swap_readpage(struct page *page, bool do_poll); -extern int swap_writepage(struct page *page, struct writeback_control *wbc); -extern void end_swap_bio_write(struct bio *bio); -extern int __swap_writepage(struct page *page, struct writeback_control *wbc, - bio_end_io_t end_write_func); extern int swap_set_page_dirty(struct page *page); - int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); int generic_swapfile_activate(struct swap_info_struct *, struct file *, sector_t *); -/* linux/mm/swap_state.c */ -/* One swap address space for each 64M swap space */ -#define SWAP_ADDRESS_SPACE_SHIFT 14 -#define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT) -extern struct address_space *swapper_spaces[]; -#define swap_address_space(entry) \ - (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ - >> SWAP_ADDRESS_SPACE_SHIFT]) static inline unsigned long total_swapcache_pages(void) { return global_node_page_state(NR_SWAPCACHE); } -extern void show_swap_cache_info(void); -extern int add_to_swap(struct page *page); -extern void *get_shadow_from_swap_cache(swp_entry_t entry); -extern int add_to_swap_cache(struct page *page, swp_entry_t entry, - gfp_t gfp, void **shadowp); -extern void __delete_from_swap_cache(struct page *page, - swp_entry_t entry, void *shadow); -extern void delete_from_swap_cache(struct page *); -extern void clear_shadow_from_swap_cache(int type, unsigned long begin, - unsigned long end); -extern void free_swap_cache(struct page *); extern void free_page_and_swap_cache(struct page *); extern void free_pages_and_swap_cache(struct page **, int); -extern struct page *lookup_swap_cache(swp_entry_t entry, - struct vm_area_struct *vma, - unsigned long addr); -struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); -extern struct page *read_swap_cache_async(swp_entry_t, gfp_t, - struct vm_area_struct *vma, unsigned long addr, - bool do_poll); -extern struct page *__read_swap_cache_async(swp_entry_t, gfp_t, - struct vm_area_struct *vma, unsigned long addr, - bool *new_page_allocated); -extern struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); -extern struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); - /* linux/mm/swapfile.c */ extern atomic_long_t nr_swap_pages; extern long total_swap_pages; @@ -528,12 +485,6 @@ static inline void put_swap_device(struct swap_info_struct *si) } #else /* CONFIG_SWAP */ - -static inline int swap_readpage(struct page *page, bool do_poll) -{ - return 0; -} - static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry) { return NULL; @@ -548,11 +499,6 @@ static inline void put_swap_device(struct swap_info_struct *si) { } -static inline struct address_space *swap_address_space(swp_entry_t entry) -{ - return NULL; -} - #define get_nr_swap_pages() 0L #define total_swap_pages 0L #define total_swapcache_pages() 0UL @@ -567,14 +513,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry) #define free_pages_and_swap_cache(pages, nr) \ release_pages((pages), (nr)); -static inline void free_swap_cache(struct page *page) -{ -} - -static inline void show_swap_cache_info(void) -{ -} - /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */ #define free_swap_and_cache(e) is_pfn_swap_entry(e) @@ -600,65 +538,6 @@ static inline void put_swap_page(struct page *page, swp_entry_t swp) { } -static inline struct page *swap_cluster_readahead(swp_entry_t entry, - gfp_t gfp_mask, struct vm_fault *vmf) -{ - return NULL; -} - -static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask, - struct vm_fault *vmf) -{ - return NULL; -} - -static inline int swap_writepage(struct page *p, struct writeback_control *wbc) -{ - return 0; -} - -static inline struct page *lookup_swap_cache(swp_entry_t swp, - struct vm_area_struct *vma, - unsigned long addr) -{ - return NULL; -} - -static inline -struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index) -{ - return find_get_page(mapping, index); -} - -static inline int add_to_swap(struct page *page) -{ - return 0; -} - -static inline void *get_shadow_from_swap_cache(swp_entry_t entry) -{ - return NULL; -} - -static inline int add_to_swap_cache(struct page *page, swp_entry_t entry, - gfp_t gfp_mask, void **shadowp) -{ - return -1; -} - -static inline void __delete_from_swap_cache(struct page *page, - swp_entry_t entry, void *shadow) -{ -} - -static inline void delete_from_swap_cache(struct page *page) -{ -} - -static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, - unsigned long end) -{ -} static inline int page_swapcount(struct page *page) { diff --git a/mm/madvise.c b/mm/madvise.c index 5604064df464..1ee4b7583379 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -35,6 +35,7 @@ #include #include "internal.h" +#include "swap.h" struct madvise_walk_private { struct mmu_gather *tlb; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 09d342c7cbd0..9b7c8181a207 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -66,6 +66,7 @@ #include #include #include "slab.h" +#include "swap.h" #include diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..d25372340107 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -86,6 +86,7 @@ #include "pgalloc-track.h" #include "internal.h" +#include "swap.h" #if defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) && !defined(CONFIG_COMPILE_TEST) #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid. diff --git a/mm/mincore.c b/mm/mincore.c index 9122676b54d6..f4f627325e12 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -20,6 +20,7 @@ #include #include +#include "swap.h" static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long end, struct mm_walk *walk) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3589febc6d31..221aa3c10b78 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -81,6 +81,7 @@ #include "internal.h" #include "shuffle.h" #include "page_reporting.h" +#include "swap.h" /* Free Page Internal flags: for internal, non-pcp variants of free_pages(). */ typedef int __bitwise fpi_t; diff --git a/mm/page_io.c b/mm/page_io.c index 0bf8e40f4e57..f8c26092e869 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -26,6 +26,7 @@ #include #include #include +#include "swap.h" void end_swap_bio_write(struct bio *bio) { diff --git a/mm/shmem.c b/mm/shmem.c index a09b29ec2b45..c8b8819fe2e6 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -38,6 +38,7 @@ #include #include #include +#include "swap.h" static struct vfsmount *shm_mnt; diff --git a/mm/swap.h b/mm/swap.h new file mode 100644 index 000000000000..13e72a5023aa --- /dev/null +++ b/mm/swap.h @@ -0,0 +1,129 @@ + +#ifdef CONFIG_SWAP +#include /* for bio_end_io_t */ + +/* linux/mm/page_io.c */ +int swap_readpage(struct page *page, bool do_poll); +int swap_writepage(struct page *page, struct writeback_control *wbc); +void end_swap_bio_write(struct bio *bio); +int __swap_writepage(struct page *page, struct writeback_control *wbc, + bio_end_io_t end_write_func); + +/* linux/mm/swap_state.c */ +/* One swap address space for each 64M swap space */ +#define SWAP_ADDRESS_SPACE_SHIFT 14 +#define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT) +extern struct address_space *swapper_spaces[]; +#define swap_address_space(entry) \ + (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ + >> SWAP_ADDRESS_SPACE_SHIFT]) + +void show_swap_cache_info(void); +int add_to_swap(struct page *page); +void *get_shadow_from_swap_cache(swp_entry_t entry); +int add_to_swap_cache(struct page *page, swp_entry_t entry, + gfp_t gfp, void **shadowp); +void __delete_from_swap_cache(struct page *page, + swp_entry_t entry, void *shadow); +void delete_from_swap_cache(struct page *); +void clear_shadow_from_swap_cache(int type, unsigned long begin, + unsigned long end); +void free_swap_cache(struct page *); +struct page *lookup_swap_cache(swp_entry_t entry, + struct vm_area_struct *vma, + unsigned long addr); +struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); + +struct page *read_swap_cache_async(swp_entry_t, gfp_t, + struct vm_area_struct *vma, + unsigned long addr, + bool do_poll); +struct page *__read_swap_cache_async(swp_entry_t, gfp_t, + struct vm_area_struct *vma, + unsigned long addr, + bool *new_page_allocated); +struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf); +struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf); + +#else /* CONFIG_SWAP */ +static inline int swap_readpage(struct page *page, bool do_poll) +{ + return 0; +} + +static inline struct address_space *swap_address_space(swp_entry_t entry) +{ + return NULL; +} + +static inline void free_swap_cache(struct page *page) +{ +} + +static inline void show_swap_cache_info(void) +{ +} + +static inline struct page *swap_cluster_readahead(swp_entry_t entry, + gfp_t gfp_mask, struct vm_fault *vmf) +{ + return NULL; +} + +static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask, + struct vm_fault *vmf) +{ + return NULL; +} + +static inline int swap_writepage(struct page *p, struct writeback_control *wbc) +{ + return 0; +} + +static inline struct page *lookup_swap_cache(swp_entry_t swp, + struct vm_area_struct *vma, + unsigned long addr) +{ + return NULL; +} + +static inline +struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index) +{ + return find_get_page(mapping, index); +} + +static inline int add_to_swap(struct page *page) +{ + return 0; +} + +static inline void *get_shadow_from_swap_cache(swp_entry_t entry) +{ + return NULL; +} + +static inline int add_to_swap_cache(struct page *page, swp_entry_t entry, + gfp_t gfp_mask, void **shadowp) +{ + return -1; +} + +static inline void __delete_from_swap_cache(struct page *page, + swp_entry_t entry, void *shadow) +{ +} + +static inline void delete_from_swap_cache(struct page *page) +{ +} + +static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, + unsigned long end) +{ +} + +#endif /* CONFIG_SWAP */ diff --git a/mm/swap_state.c b/mm/swap_state.c index 8d4104242100..bb38453425c7 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -23,6 +23,7 @@ #include #include #include "internal.h" +#include "swap.h" /* * swapper_space is a fiction, retained to simplify the path through diff --git a/mm/swapfile.c b/mm/swapfile.c index bf0df7aa7158..71c7a31dd291 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -44,6 +44,7 @@ #include #include #include +#include "swap.h" static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); diff --git a/mm/util.c b/mm/util.c index 7e43369064c8..619697e3d935 100644 --- a/mm/util.c +++ b/mm/util.c @@ -27,6 +27,7 @@ #include #include "internal.h" +#include "swap.h" /** * kfree_const - conditionally free memory diff --git a/mm/vmscan.c b/mm/vmscan.c index 090bfb605ecf..5c734ffc6057 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -58,6 +58,7 @@ #include #include "internal.h" +#include "swap.h" #define CREATE_TRACE_POINTS #include From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721479 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 539DCC433F5 for ; Mon, 24 Jan 2022 03:50:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241145AbiAXDu1 (ORCPT ); Sun, 23 Jan 2022 22:50:27 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:56712 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241158AbiAXDuZ (ORCPT ); Sun, 23 Jan 2022 22:50:25 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D5C1B212C4; Mon, 24 Jan 2022 03:50:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996223; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D6Vgok4rvR931e84UowBUSY8Jhoy2T4G53/bpYOdzRw=; b=qfTfxaPO9wnYVoBpIIE4Bljj6FUowggfuzi9HAhgJO5UeDZhGeWxSvTeNGMNzNdqzj5q/x teZTiG5lmG/qnDYcjYQYidQ+qe+3TI1dv8UU0+oBvrK7BeTJA76lg4i3JBKBPNOGgjRzae wRzUbqVHPc/4S7zSzgvsnBuk4jsJFGM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996223; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D6Vgok4rvR931e84UowBUSY8Jhoy2T4G53/bpYOdzRw=; b=fHub5mvlaa9hxdwcNnTf3ORPPh3W/K0C5TrT/E/tOmfoH8eOG+D0gqlf+Ypq4/HPKgqBC9 mwQMdUwjy+CDuiDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D247F1331A; Mon, 24 Jan 2022 03:50:20 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id H0GBI/wh7mGkRAAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:50:20 +0000 Subject: [PATCH 02/23] MM: extend block-plugging to cover all swap reads with read-ahead From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611274.26253.13900771841681128440.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Code that does swap read-ahead uses blk_start_plug() and blk_finish_plug() to allow lower levels to combine multiple read-ahead pages into a single request, but calls blk_finish_plug() *before* submitting the original (non-ahead) read request. This missed an opportunity to combine read requests. This patch moves the blk_finish_plug to *after* all the reads. This will likely combine the primary read with some of the "ahead" reads, and that may slightly increase the latency of that read, but it should more than make up for this by making more efficient use of the storage path. The patch mostly makes the code look more consistent. Performance change is unlikely to be noticeable. Fixes-no-auto-backport: 3fb5c298b04e ("swap: allow swap readahead to be merged") Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig --- mm/swap_state.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index bb38453425c7..093ecf864200 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -625,6 +625,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma = vmf->vma; unsigned long addr = vmf->address; + blk_start_plug(&plug); mask = swapin_nr_pages(offset) - 1; if (!mask) goto skip; @@ -638,7 +639,6 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (end_offset >= si->max) end_offset = si->max - 1; - blk_start_plug(&plug); for (offset = start_offset; offset <= end_offset ; offset++) { /* Ok, do the async read-ahead now */ page = __read_swap_cache_async( @@ -655,11 +655,12 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, } put_page(page); } - blk_finish_plug(&plug); lru_add_drain(); /* Push any new pages onto the LRU now */ skip: - return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll); + page = read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll); + blk_finish_plug(&plug); + return page; } int init_swap_address_space(unsigned int type, unsigned long nr_pages) @@ -800,11 +801,11 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, .win = 1, }; + blk_start_plug(&plug); swap_ra_info(vmf, &ra_info); if (ra_info.win == 1) goto skip; - blk_start_plug(&plug); for (i = 0, pte = ra_info.ptes; i < ra_info.nr_pte; i++, pte++) { pentry = *pte; @@ -828,11 +829,12 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, } put_page(page); } - blk_finish_plug(&plug); lru_add_drain(); skip: - return read_swap_cache_async(fentry, gfp_mask, vma, vmf->address, + page = read_swap_cache_async(fentry, gfp_mask, vma, vmf->address, ra_info.win == 1); + blk_finish_plug(&plug); + return page; } /** From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721480 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75A0BC433F5 for ; Mon, 24 Jan 2022 03:50:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241136AbiAXDuj (ORCPT ); Sun, 23 Jan 2022 22:50:39 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:56726 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241127AbiAXDuj (ORCPT ); Sun, 23 Jan 2022 22:50:39 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 2D57B21992; Mon, 24 Jan 2022 03:50:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996238; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gFsUp8HKMYNcCQgqtdLRsr/SQ1VZIk4d6vfDv1KaX7Q=; b=pO0+LXrfPSf9dWW/VanX0xUHulM+fd1K3FfTbHbPDhGLhgNSni5zK+cAX1QPQI7qMEU63I gJaM5xqfn7Nzh8M7mFy/NrQa9Sbl6bvCT4dG5jDalX8gwPj0Pm9TtUNKwC+w7IxSCOA+K+ O0Jro5AEQwI3ETSyW729heTS8bywm/Y= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996238; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gFsUp8HKMYNcCQgqtdLRsr/SQ1VZIk4d6vfDv1KaX7Q=; b=vehXOo1OzvhIHeGTbbCYYY7BPju50owculff0ll6phq6ubuMvOioyqUGUy6DgzdsgnRo6l 1YuxCJvGuN1DY0DA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 309B01331A; Mon, 24 Jan 2022 03:50:34 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id klWQNwoi7mG0RAAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:50:34 +0000 Subject: [PATCH 03/23] MM: drop swap_set_page_dirty From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611274.26253.3394253485576079921.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Pages that are written to swap are owned by the MM subsystem - not any filesystem. When such a page is passed to a filesystem to be written out to a swap-file, the filesystem handles the data, but the page itself does not belong to the filesystem. So calling the filesystem's set_page_dirty address_space operation makes no sense. This is for pages in the given address space, and a page to be written to swap does not exist in the given address space. So drop swap_set_page_dirty() which calls the address-space's set_page_dirty, and alway use __set_page_dirty_no_writeback, which is appropriate for pages being swapped out. Fixes-no-auto-backport: 62c230bc1790 ("mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages") Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig --- include/linux/swap.h | 1 - mm/page_io.c | 14 -------------- mm/swap_state.c | 2 +- 3 files changed, 1 insertion(+), 16 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 3f54a8941c9d..a43929f7033e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -419,7 +419,6 @@ extern void kswapd_stop(int nid); #ifdef CONFIG_SWAP -extern int swap_set_page_dirty(struct page *page); int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); int generic_swapfile_activate(struct swap_info_struct *, struct file *, diff --git a/mm/page_io.c b/mm/page_io.c index f8c26092e869..34b12d6f94d7 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -438,17 +438,3 @@ int swap_readpage(struct page *page, bool synchronous) delayacct_swapin_end(); return ret; } - -int swap_set_page_dirty(struct page *page) -{ - struct swap_info_struct *sis = page_swap_info(page); - - if (data_race(sis->flags & SWP_FS_OPS)) { - struct address_space *mapping = sis->swap_file->f_mapping; - - VM_BUG_ON_PAGE(!PageSwapCache(page), page); - return mapping->a_ops->set_page_dirty(page); - } else { - return __set_page_dirty_no_writeback(page); - } -} diff --git a/mm/swap_state.c b/mm/swap_state.c index 093ecf864200..d541594be1c3 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -31,7 +31,7 @@ */ static const struct address_space_operations swap_aops = { .writepage = swap_writepage, - .set_page_dirty = swap_set_page_dirty, + .set_page_dirty = __set_page_dirty_no_writeback, #ifdef CONFIG_MIGRATION .migratepage = migrate_page, #endif From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721481 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D10E9C433F5 for ; Mon, 24 Jan 2022 03:51:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241126AbiAXDu6 (ORCPT ); Sun, 23 Jan 2022 22:50:58 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:56750 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241127AbiAXDuw (ORCPT ); Sun, 23 Jan 2022 22:50:52 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 0F0922198E; Mon, 24 Jan 2022 03:50:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996251; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n3+8Urza+M6Hp0rhzCu2VxlEP+N2PAYC/q2bkhSRhhQ=; b=MRpjffj/7VeHdxIDvWWjvl/3r4/m4nUYdxDb5oZ0MqJtNXzmzCabuDPrwJjGKWSRCKT1hv 0eIzNEVsbLqTPxQ8NUD6GLT8O4PJddETBuHpkFe2wmXYa5XWBgH+MaJ4KqUPjU3ewxZtwB KVWPF5jcDiBVq7BwPzWnL2CjuWgH0EQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996251; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n3+8Urza+M6Hp0rhzCu2VxlEP+N2PAYC/q2bkhSRhhQ=; b=H600lppcNVits1+ijjv1xGR2cG4x3GPjkeRayoUjM2oyaz9E6uw3QaROGd3vyIteNPOVKO H+ylIEPXzcLPB6Aw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0DC1A1331A; Mon, 24 Jan 2022 03:50:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id PwBpLhci7mHGRAAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:50:47 +0000 Subject: [PATCH 04/23] MM: move responsibility for setting SWP_FS_OPS to ->swap_activate From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611275.26253.11641346650863170349.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org If a filesystem wishes to handle all swap IO itself (via ->direct_IO), rather than just providing devices addresses for submit_bio(), SWP_FS_OPS must be set. Currently the protocol for setting this it to have ->swap_activate return zero. In that case SWP_FS_OPS is set, and add_swap_extent() is called for the entire file. This is a little clumsy as different return values for ->swap_activate have quite different meanings, and it makes it hard to search for which filesystems require SWP_FS_OPS to be set. So remove the special meaning of a zero return, and require the filesystem to set SWP_FS_OPS if it so desires, and to always call add_swap_extent() as required. Currently only NFS and CIFS return zero for add_swap_extent(). Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig --- fs/cifs/file.c | 3 ++- fs/nfs/file.c | 13 +++++++++++-- include/linux/swap.h | 6 ++++++ mm/swapfile.c | 10 +++------- 4 files changed, 22 insertions(+), 10 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 59334be9ed3b..c795d4a9ec4a 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -4974,7 +4974,8 @@ static int cifs_swap_activate(struct swap_info_struct *sis, * from reading or writing the file */ - return 0; + sis->flags |= SWP_FS_OPS; + return add_swap_extent(sis, 0, sis->max, 0); } static void cifs_swap_deactivate(struct file *file) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 76d76acbc594..d5aa55c7edb0 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -488,6 +488,7 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, { unsigned long blocks; long long isize; + int ret; struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); struct inode *inode = file->f_mapping->host; @@ -500,9 +501,17 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, return -EINVAL; } + ret = rpc_clnt_swap_activate(clnt); + if (ret) + return ret; + ret = add_swap_extent(sis, 0, sis->max, 0); + if (ret < 0) { + rpc_clnt_swap_deactivate(clnt); + return ret; + } *span = sis->pages; - - return rpc_clnt_swap_activate(clnt); + sis->flags |= SWP_FS_OPS; + return ret; } static void nfs_swap_deactivate(struct file *file) diff --git a/include/linux/swap.h b/include/linux/swap.h index a43929f7033e..b57cff3c5ac2 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -573,6 +573,12 @@ static inline swp_entry_t get_swap_page(struct page *page) return entry; } +static inline int add_swap_extent(struct swap_info_struct *sis, + unsigned long start_page, + unsigned long nr_pages, sector_t start_block) +{ + return -EINVAL; +} #endif /* CONFIG_SWAP */ #ifdef CONFIG_THP_SWAP diff --git a/mm/swapfile.c b/mm/swapfile.c index 71c7a31dd291..ed6028aea8bf 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2347,13 +2347,9 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span) if (mapping->a_ops->swap_activate) { ret = mapping->a_ops->swap_activate(sis, swap_file, span); - if (ret >= 0) - sis->flags |= SWP_ACTIVATED; - if (!ret) { - sis->flags |= SWP_FS_OPS; - ret = add_swap_extent(sis, 0, sis->max, 0); - *span = sis->pages; - } + if (ret < 0) + return ret; + sis->flags |= SWP_ACTIVATED; return ret; } From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721482 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45A0CC433EF for ; Mon, 24 Jan 2022 03:51:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241169AbiAXDvP (ORCPT ); Sun, 23 Jan 2022 22:51:15 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:46874 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241175AbiAXDvJ (ORCPT ); Sun, 23 Jan 2022 22:51:09 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 82AA41F3A0; Mon, 24 Jan 2022 03:51:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996268; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7aGIJFrJoxVYTOarUMpcVxyhqOqRdBZGTPLRKjCZ04Y=; b=n+mu4ts5W6++NB/HdVWdE8/u2lXQLbHMlfwpoZTqDMnapsQ13f+u7f9HhjxBTWTO5iDyLQ D0Iy1Xqj9oNV7juUdF1ZhfwDMEWP8KHBfIA5kx9sPvOBTcgObD+IXl6Guud+j9drUFn32J kIgz1fZoFZeIUalBTmx+NouK2MsD3mw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996268; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7aGIJFrJoxVYTOarUMpcVxyhqOqRdBZGTPLRKjCZ04Y=; b=8DCw4T5/4ajAli4QaAAQISZwKGlcsbSrCUT2KT+Qt+OEuiQAoF11dpSgoz/pGYcsT324Nj PW28866cIoVuzYDg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2098E1331A; Mon, 24 Jan 2022 03:51:04 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id rbgdMygi7mH6RAAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:51:04 +0000 Subject: [PATCH 05/23] MM: reclaim mustn't enter FS for SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611276.26253.11555458501911153645.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org If swap-out is using filesystem operations (SWP_FS_OPS), then it is not safe to enter the FS for reclaim. So only down-grade the requirement for swap pages to __GFP_IO after checking that SWP_FS_OPS are not being used. This makes the calculation of "may_enter_fs" slightly more complex, so move it into a separate function. with that done, there is little value in maintaining the bool variable any more. So replace the may_enter_fs variable with a may_enter_fs() function. This removes any risk for the variable becoming out-of-date. Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig --- mm/swap.h | 8 ++++++++ mm/vmscan.c | 29 ++++++++++++++++++++--------- 2 files changed, 28 insertions(+), 9 deletions(-) diff --git a/mm/swap.h b/mm/swap.h index 13e72a5023aa..5c676e55f288 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -47,6 +47,10 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf); +static inline unsigned int page_swap_flags(struct page *page) +{ + return page_swap_info(page)->flags; +} #else /* CONFIG_SWAP */ static inline int swap_readpage(struct page *page, bool do_poll) { @@ -126,4 +130,8 @@ static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, { } +static inline unsigned int page_swap_flags(struct page *page) +{ + return 0; +} #endif /* CONFIG_SWAP */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 5c734ffc6057..ad5026d06aa8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1506,6 +1506,22 @@ static unsigned int demote_page_list(struct list_head *demote_pages, return nr_succeeded; } +static bool may_enter_fs(struct page *page, gfp_t gfp_mask) +{ + if (gfp_mask & __GFP_FS) + return true; + if (!PageSwapCache(page) || !(gfp_mask & __GFP_IO)) + return false; + /* + * We can "enter_fs" for swap-cache with only __GFP_IO + * providing this isn't SWP_FS_OPS. + * ->flags can be updated non-atomicially (scan_swap_map_slots), + * but that will never affect SWP_FS_OPS, so the data_race + * is safe. + */ + return !data_race(page_swap_flags(page) & SWP_FS_OPS); +} + /* * shrink_page_list() returns the number of reclaimed pages */ @@ -1531,7 +1547,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, struct address_space *mapping; struct page *page; enum page_references references = PAGEREF_RECLAIM; - bool dirty, writeback, may_enter_fs; + bool dirty, writeback; unsigned int nr_pages; cond_resched(); @@ -1555,9 +1571,6 @@ static unsigned int shrink_page_list(struct list_head *page_list, if (!sc->may_unmap && page_mapped(page)) goto keep_locked; - may_enter_fs = (sc->gfp_mask & __GFP_FS) || - (PageSwapCache(page) && (sc->gfp_mask & __GFP_IO)); - /* * The number of dirty pages determines if a node is marked * reclaim_congested. kswapd will stall and start writing @@ -1602,7 +1615,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, * not to fs). In this case mark the page for immediate * reclaim and continue scanning. * - * Require may_enter_fs because we would wait on fs, which + * Require may_enter_fs() because we would wait on fs, which * may not have submitted IO yet. And the loop driver might * enter reclaim, and deadlock if it waits on a page for * which it is needed to do the write (loop masks off @@ -1634,7 +1647,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, /* Case 2 above */ } else if (writeback_throttling_sane(sc) || - !PageReclaim(page) || !may_enter_fs) { + !PageReclaim(page) || !may_enter_fs(page, sc->gfp_mask)) { /* * This is slightly racy - end_page_writeback() * might have just cleared PageReclaim, then @@ -1724,8 +1737,6 @@ static unsigned int shrink_page_list(struct list_head *page_list, goto activate_locked_split; } - may_enter_fs = true; - /* Adding to swap updated mapping */ mapping = page_mapping(page); } @@ -1795,7 +1806,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, if (references == PAGEREF_RECLAIM_CLEAN) goto keep_locked; - if (!may_enter_fs) + if (!may_enter_fs(page, sc->gfp_mask)) goto keep_locked; if (!sc->may_writepage) goto keep_locked; From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721483 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 513C4C433F5 for ; Mon, 24 Jan 2022 03:51:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241170AbiAXDv2 (ORCPT ); Sun, 23 Jan 2022 22:51:28 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:56790 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241187AbiAXDvX (ORCPT ); Sun, 23 Jan 2022 22:51:23 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 79C7521997; Mon, 24 Jan 2022 03:51:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996282; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UjoQDQNwue/WVNkMxv8gaLA+H08AHR1sHCxK8SfrfOw=; b=XVBLmQnnCMglGf7sofMx13jU9ti0vSDAbbTrplDPXD5muzuUqsLZldgv1FSPaYt85eRp1i WY8xXVg5vrhP5AF6MtuGLdMJBDus++//IYrQ4cVewtx6j9EVvdOTx0F/pjQE68I/RnwDFG yW+wzRl1U655t1avEYt0BfFnfo0tN3w= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996282; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UjoQDQNwue/WVNkMxv8gaLA+H08AHR1sHCxK8SfrfOw=; b=6V3PJsW+Rs6ucHlwRLcUPg3K55VPHFjsiQSM7IP1eNnFXZwG5OO8dWqHcRb9sQkJs7/gG1 a+3I7cJILENAkVDg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3DF5413305; Mon, 24 Jan 2022 03:51:17 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id FwPMOjUi7mEORQAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:51:17 +0000 Subject: [PATCH 06/23] MM: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611276.26253.13667789323141516970.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org swap currently uses ->readpage to read swap pages. This can only request one page at a time from the filesystem, which is not most efficient. swap uses ->direct_IO for writes which while this is adequate is an inappropriate over-loading. ->direct_IO may need to had handle allocate space for holes or other details that are not relevant for swap. So this patch introduces a new address_space operation: ->swap_rw. In this patch it is used for reads, and a subsequent patch will switch writes to use it. No filesystem yet supports ->swap_rw, but that is not a problem because no filesystem actually works with filesystem-based swap. Only two filesystems set SWP_FS_OPS: - cifs sets the flag, but ->direct_IO always fails so swap cannot work. - nfs sets the flag, but ->direct_IO calls generic_write_checks() which has failed on swap files for several releases. To ensure that a NULL ->swap_rw isn't called, ->activate_swap() for both NFS and cifs are changed to fail if ->swap_rw is not set. This can be removed if/when the function is added. Future patches will restore swap-over-NFS functionality. To submit an async read with ->swap_rw() we need to allocate a structure to hold the kiocb and other details. swap_readpage() cannot handle transient failure, so we create a mempool to provide the structures. Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig --- fs/cifs/file.c | 4 +++ fs/nfs/file.c | 4 +++ include/linux/fs.h | 1 + mm/page_io.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++----- mm/swap.h | 1 + mm/swapfile.c | 5 ++++ 6 files changed, 77 insertions(+), 6 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index c795d4a9ec4a..b3898c4aa5ad 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -4946,6 +4946,10 @@ static int cifs_swap_activate(struct swap_info_struct *sis, cifs_dbg(FYI, "swap activate\n"); + if (!swap_file->f_mapping->a_ops->swap_rw) + /* Cannot support swap */ + return -EINVAL; + spin_lock(&inode->i_lock); blocks = inode->i_blocks; isize = inode->i_size; diff --git a/fs/nfs/file.c b/fs/nfs/file.c index d5aa55c7edb0..3dbef2c31567 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -492,6 +492,10 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); struct inode *inode = file->f_mapping->host; + if (!file->f_mapping->a_ops->swap_rw) + /* Cannot support swap */ + return -EINVAL; + spin_lock(&inode->i_lock); blocks = inode->i_blocks; isize = inode->i_size; diff --git a/include/linux/fs.h b/include/linux/fs.h index f3daaea16554..4fade3b20c87 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -409,6 +409,7 @@ struct address_space_operations { int (*swap_activate)(struct swap_info_struct *sis, struct file *file, sector_t *span); void (*swap_deactivate)(struct file *file); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); }; extern const struct address_space_operations empty_aops; diff --git a/mm/page_io.c b/mm/page_io.c index 34b12d6f94d7..e90a3231f225 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -284,6 +284,25 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page) #define bio_associate_blkg_from_page(bio, page) do { } while (0) #endif /* CONFIG_MEMCG && CONFIG_BLK_CGROUP */ +struct swap_iocb { + struct kiocb iocb; + struct bio_vec bvec; +}; +static mempool_t *sio_pool; + +int sio_pool_init(void) +{ + if (!sio_pool) { + mempool_t *pool = mempool_create_kmalloc_pool( + SWAP_CLUSTER_MAX, sizeof(struct swap_iocb)); + if (cmpxchg(&sio_pool, NULL, pool)) + mempool_destroy(pool); + } + if (!sio_pool) + return -ENOMEM; + return 0; +} + int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func) { @@ -355,6 +374,48 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return 0; } +static void sio_read_complete(struct kiocb *iocb, long ret) +{ + struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); + struct page *page = sio->bvec.bv_page; + + if (ret != 0 && ret != PAGE_SIZE) { + SetPageError(page); + ClearPageUptodate(page); + pr_alert_ratelimited("Read-error on swap-device\n"); + } else { + SetPageUptodate(page); + count_vm_event(PSWPIN); + } + unlock_page(page); + mempool_free(sio, sio_pool); +} + +static int swap_readpage_fs(struct page *page) +{ + struct swap_info_struct *sis = page_swap_info(page); + struct file *swap_file = sis->swap_file; + struct address_space *mapping = swap_file->f_mapping; + struct iov_iter from; + struct swap_iocb *sio; + loff_t pos = page_file_offset(page); + int ret; + + sio = mempool_alloc(sio_pool, GFP_KERNEL); + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_pos = pos; + sio->iocb.ki_complete = sio_read_complete; + sio->bvec.bv_page = page; + sio->bvec.bv_len = PAGE_SIZE; + sio->bvec.bv_offset = 0; + + iov_iter_bvec(&from, READ, &sio->bvec, 1, PAGE_SIZE); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_read_complete(&sio->iocb, ret); + return ret; +} + int swap_readpage(struct page *page, bool synchronous) { struct bio *bio; @@ -381,12 +442,7 @@ int swap_readpage(struct page *page, bool synchronous) } if (data_race(sis->flags & SWP_FS_OPS)) { - struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - - ret = mapping->a_ops->readpage(swap_file, page); - if (!ret) - count_vm_event(PSWPIN); + ret = swap_readpage_fs(page); goto out; } diff --git a/mm/swap.h b/mm/swap.h index 5c676e55f288..e8ee995cf8d8 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -3,6 +3,7 @@ #include /* for bio_end_io_t */ /* linux/mm/page_io.c */ +int sio_pool_init(void); int swap_readpage(struct page *page, bool do_poll); int swap_writepage(struct page *page, struct writeback_control *wbc); void end_swap_bio_write(struct bio *bio); diff --git a/mm/swapfile.c b/mm/swapfile.c index ed6028aea8bf..c800c17bf0c8 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2350,6 +2350,11 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span) if (ret < 0) return ret; sis->flags |= SWP_ACTIVATED; + if ((sis->flags & SWP_FS_OPS) && + sio_pool_init() != 0) { + destroy_swap_extents(sis); + return -ENOMEM; + } return ret; } From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721484 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FEDAC433EF for ; Mon, 24 Jan 2022 03:51:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235317AbiAXDvg (ORCPT ); Sun, 23 Jan 2022 22:51:36 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:56810 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236070AbiAXDvf (ORCPT ); Sun, 23 Jan 2022 22:51:35 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 8F8C721997; Mon, 24 Jan 2022 03:51:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996294; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n/cgZ4EscdnFAYZEj1bTLNqQrrTGr07Xq40A143hCZg=; b=ILMY/8fqzrFhuGhnzWH1cISUXoOxajFChXV5nL0F6ougH4v9ZA49wxyAPika9w+KNSkjIG Ldlv1Kbfge/z8iesN89TjyDMtwHIx0eat7CA18+AsvUv9mzrqaQPrnmY4LCbEJLUWGCpi8 JbZI8wzk/DT2TJbFezEh8jd2DsrvUNM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996294; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n/cgZ4EscdnFAYZEj1bTLNqQrrTGr07Xq40A143hCZg=; b=3XRlusLMmRngEeDcRJoZzsb1O+QmpENg6F1W4CG9Ja5wmBVrcmYAwywsYrnnbT7xSuDx99 E3g39JDAS350EEAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 95CE613305; Mon, 24 Jan 2022 03:51:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id GTGYFEMi7mEaRQAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:51:31 +0000 Subject: [PATCH 07/23] MM: perform async writes to SWP_FS_OPS swap-space using ->swap_rw From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611277.26253.18349860115008677213.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org This patch switches swap-out to SWP_FS_OPS swap-spaces to use ->swap_rw and makes the writes asynchronous, like they are for other swap spaces. To make it async we need to allocate the kiocb struct from a mempool. This may block, but won't block as long as waiting for the write to complete. At most it will wait for some previous swap IO to complete. Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig --- mm/page_io.c | 93 +++++++++++++++++++++++++++++++++------------------------- 1 file changed, 53 insertions(+), 40 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index e90a3231f225..6e32ca35d9b6 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -303,6 +303,57 @@ int sio_pool_init(void) return 0; } +static void sio_write_complete(struct kiocb *iocb, long ret) +{ + struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); + struct page *page = sio->bvec.bv_page; + + if (ret != 0 && ret != PAGE_SIZE) { + /* + * In the case of swap-over-nfs, this can be a + * temporary failure if the system has limited + * memory for allocating transmit buffers. + * Mark the page dirty and avoid + * folio_rotate_reclaimable but rate-limit the + * messages but do not flag PageError like + * the normal direct-to-bio case as it could + * be temporary. + */ + set_page_dirty(page); + ClearPageReclaim(page); + pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n", + ret, page_file_offset(page)); + } else + count_vm_event(PSWPOUT); + end_page_writeback(page); + mempool_free(sio, sio_pool); +} + +static int swap_writepage_fs(struct page *page, struct writeback_control *wbc) +{ + struct swap_iocb *sio; + struct swap_info_struct *sis = page_swap_info(page); + struct file *swap_file = sis->swap_file; + struct address_space *mapping = swap_file->f_mapping; + struct iov_iter from; + int ret; + + set_page_writeback(page); + unlock_page(page); + sio = mempool_alloc(sio_pool, GFP_NOIO); + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_complete = sio_write_complete; + sio->iocb.ki_pos = page_file_offset(page); + sio->bvec.bv_page = page; + sio->bvec.bv_len = PAGE_SIZE; + sio->bvec.bv_offset = 0; + iov_iter_bvec(&from, WRITE, &sio->bvec, 1, PAGE_SIZE); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_write_complete(&sio->iocb, ret); + return ret; +} + int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func) { @@ -311,46 +362,8 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, struct swap_info_struct *sis = page_swap_info(page); VM_BUG_ON_PAGE(!PageSwapCache(page), page); - if (data_race(sis->flags & SWP_FS_OPS)) { - struct kiocb kiocb; - struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - struct bio_vec bv = { - .bv_page = page, - .bv_len = PAGE_SIZE, - .bv_offset = 0 - }; - struct iov_iter from; - - iov_iter_bvec(&from, WRITE, &bv, 1, PAGE_SIZE); - init_sync_kiocb(&kiocb, swap_file); - kiocb.ki_pos = page_file_offset(page); - - set_page_writeback(page); - unlock_page(page); - ret = mapping->a_ops->direct_IO(&kiocb, &from); - if (ret == PAGE_SIZE) { - count_vm_event(PSWPOUT); - ret = 0; - } else { - /* - * In the case of swap-over-nfs, this can be a - * temporary failure if the system has limited - * memory for allocating transmit buffers. - * Mark the page dirty and avoid - * folio_rotate_reclaimable but rate-limit the - * messages but do not flag PageError like - * the normal direct-to-bio case as it could - * be temporary. - */ - set_page_dirty(page); - ClearPageReclaim(page); - pr_err_ratelimited("Write error on dio swapfile (%llu)\n", - page_file_offset(page)); - } - end_page_writeback(page); - return ret; - } + if (data_race(sis->flags & SWP_FS_OPS)) + return swap_writepage_fs(page, wbc); ret = bdev_write_page(sis->bdev, swap_page_sector(page), page, wbc); if (!ret) { From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721485 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3ED42C433EF for ; Mon, 24 Jan 2022 03:51:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236070AbiAXDvw (ORCPT ); Sun, 23 Jan 2022 22:51:52 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:46904 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235963AbiAXDvw (ORCPT ); Sun, 23 Jan 2022 22:51:52 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 027C61F3B1; Mon, 24 Jan 2022 03:51:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996311; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QZSVg3aKIz40nUMCvjUBG4ZEl086iHutx/VbHAUIx6g=; b=FaCT8C1KKiZslEDKX0B0nnEP4OIwXLlUiEhOwkGJDBaYzBxeeguuvTaQXFCpbtLdCapya/ Gc2MurPmUUKTe6q3WUuO/LoCDpoJZI8zBqY8GPgDElQyuk8Xh7M9lKOfy0hbsq2C1plfkz mQZPM+sgH+Qd2DcjWAilknmFULUgQ4c= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996311; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QZSVg3aKIz40nUMCvjUBG4ZEl086iHutx/VbHAUIx6g=; b=2dDfgB4kbXXac24BmsiGHxhQ/+8avwdmlQkpUQ6uxYnfQjmrENFosVnlPPSEAVdkcy7eyq VZAxN3wOy5t6E1Cw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A2AC713305; Mon, 24 Jan 2022 03:51:46 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 0DHNF1Ii7mEpRQAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:51:46 +0000 Subject: [PATCH 08/23] DOC: update documentation for swap_activate and swap_rw From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611278.26253.2945860698197438729.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org This documentation for ->swap_activate() has been out-of-date for a long time. This patch updates it to match recent changes, and adds documentation for the associated ->swap_rw() Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig --- Documentation/filesystems/locking.rst | 18 ++++++++++++------ Documentation/filesystems/vfs.rst | 17 ++++++++++++----- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 3f9b1497ebb8..fbb10378d5ee 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -260,8 +260,9 @@ prototypes:: int (*launder_page)(struct page *); int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long); int (*error_remove_page)(struct address_space *, struct page *); - int (*swap_activate)(struct file *); + int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span) int (*swap_deactivate)(struct file *); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); locking rules: All except set_page_dirty and freepage may block @@ -290,6 +291,7 @@ is_partially_uptodate: yes error_remove_page: yes swap_activate: no swap_deactivate: no +swap_rw: yes, unlocks ====================== ======================== ========= =============== ->write_begin(), ->write_end() and ->readpage() may be called from @@ -392,15 +394,19 @@ cleaned, or an error value if not. Note that in order to prevent the page getting mapped back in and redirtied, it needs to be kept locked across the entire operation. -->swap_activate will be called with a non-zero argument on -files backing (non block device backed) swapfiles. A return value -of zero indicates success, in which case this file can be used for -backing swapspace. The swapspace operations will be proxied to the -address space operations. +->swap_activate() will be called to prepare the given file for swap. It +should perform any validation and preparation necessary to ensure that +writes can be performed with minimal memory allocation. It should call +add_swap_extent(), or the helper iomap_swapfile_activate(), and return +the number of extents added. If IO should be submitted through +->swap_rw(), it should set SWP_FS_OPS, otherwise IO will be submitted +directly to the block device ``sis->bdev``. ->swap_deactivate() will be called in the sys_swapoff() path after ->swap_activate() returned success. +->swap_rw will be called for swap IO if SWP_FS_OPS was set by ->swap_activate(). + file_lock_operations ==================== diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index bf5c48066fac..779d23fc7954 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -751,8 +751,9 @@ cache in your filesystem. The following members are defined: unsigned long); void (*is_dirty_writeback) (struct page *, bool *, bool *); int (*error_remove_page) (struct mapping *mapping, struct page *page); - int (*swap_activate)(struct file *); + int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span) int (*swap_deactivate)(struct file *); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); }; ``writepage`` @@ -959,15 +960,21 @@ cache in your filesystem. The following members are defined: unless you have them locked or reference counts increased. ``swap_activate`` - Called when swapon is used on a file to allocate space if - necessary and pin the block lookup information in memory. A - return value of zero indicates success, in which case this file - can be used to back swapspace. + + Called to prepare the given file for swap. It should perform + any validation and preparation necessary to ensure that writes + can be performed with minimal memory allocation. It should call + add_swap_extent(), or the helper iomap_swapfile_activate(), and + return the number of extents added. If IO should be submitted + through ->swap_rw(), it should set SWP_FS_OPS, otherwise IO will + be submitted directly to the block device ``sis->bdev``. ``swap_deactivate`` Called during swapoff on files where swap_activate was successful. +``swap_rw`` + Called to read or write swap pages when SWP_FS_OPS is set. The File Object =============== From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721486 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2550C433FE for ; Mon, 24 Jan 2022 03:52:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232710AbiAXDwJ (ORCPT ); Sun, 23 Jan 2022 22:52:09 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:46932 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241187AbiAXDwI (ORCPT ); Sun, 23 Jan 2022 22:52:08 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 2EED41F3B1; Mon, 24 Jan 2022 03:52:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996327; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hXM27Ktz0q7GrqgI86euZLVhJuAWozKSUgPAu9ALmpc=; b=u9i5KjaF6Cn4C7Alxy2UjJlqk65gob4QWaDmCWNlnT1O4xjgGA+Z3KaH6ZGLjw6ZMxg2QY 4qRdEQqZYJLT4JlCjLnqSzYy626J3zS783DwTjolcktJzNbfRqKX7Os4zCsqJ09CoJF4O6 2EYKf06c+qUdHdjMGtxkaibGjoRviEI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996327; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hXM27Ktz0q7GrqgI86euZLVhJuAWozKSUgPAu9ALmpc=; b=wkiiDO1OlmC/xnDFA1Jecbo9BhxrJEP6/taW6KXQ22Ol8DMlaOkgSXO1Q8SQriumxjjhd3 5wuO9Iykg35+76Aw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2F74E13305; Mon, 24 Jan 2022 03:52:01 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id yE+5N2Ei7mFCRQAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:52:01 +0000 Subject: [PATCH 09/23] MM: submit multipage reads for SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611278.26253.14950274629759580371.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org swap_readpage() is given one page at a time, but maybe called repeatedly in succession. For block-device swapspace, the blk_plug functionality allows the multiple pages to be combined together at lower layers. That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page reads. With this patch we pass in a pointer-to-pointer when swap_readpage can store state between calls - much like the effect of blk_plug. After calling swap_readpage() some number of times, the state will be passed to swap_read_unplug() which can submit the combined request. Some caller currently call blk_finish_plug() *before* the final call to swap_readpage(), so the last page cannot be included. This patch moves blk_finish_plug() to after the last call, and calls swap_read_unplug() there too. Signed-off-by: NeilBrown Reported-by: kernel test robot Reviewed-by: Christoph Hellwig Reported-by: kernel test robot --- mm/madvise.c | 8 +++- mm/memory.c | 2 + mm/page_io.c | 102 +++++++++++++++++++++++++++++++++++-------------------- mm/swap.h | 16 +++++++-- mm/swap_state.c | 19 +++++++--- 5 files changed, 98 insertions(+), 49 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 1ee4b7583379..2b1ab30af141 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -225,6 +225,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, pte_t *orig_pte; struct vm_area_struct *vma = walk->private; unsigned long index; + struct swap_iocb *splug = NULL; if (pmd_none_or_trans_huge_or_clear_bad(pmd)) return 0; @@ -246,10 +247,11 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, continue; page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, - vma, index, false); + vma, index, false, &splug); if (page) put_page(page); } + swap_read_unplug(splug); return 0; } @@ -265,6 +267,7 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma, XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start)); pgoff_t end_index = linear_page_index(vma, end + PAGE_SIZE - 1); struct page *page; + struct swap_iocb *splug = NULL; rcu_read_lock(); xas_for_each(&xas, page, end_index) { @@ -277,13 +280,14 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma, swap = radix_to_swp_entry(page); page = read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE, - NULL, 0, false); + NULL, 0, false, &splug); if (page) put_page(page); rcu_read_lock(); } rcu_read_unlock(); + swap_read_unplug(splug); lru_add_drain(); /* Push any new pages onto the LRU now */ } diff --git a/mm/memory.c b/mm/memory.c index d25372340107..8bd18c54eaa4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3559,7 +3559,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* To provide entry to swap_readpage() */ set_page_private(page, entry.val); - swap_readpage(page, true); + swap_readpage(page, true, NULL); set_page_private(page, 0); } } else { diff --git a/mm/page_io.c b/mm/page_io.c index 6e32ca35d9b6..bcf655d650c8 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -286,7 +286,8 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page) struct swap_iocb { struct kiocb iocb; - struct bio_vec bvec; + struct bio_vec bvec[SWAP_CLUSTER_MAX]; + int pages; }; static mempool_t *sio_pool; @@ -306,7 +307,7 @@ int sio_pool_init(void) static void sio_write_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); - struct page *page = sio->bvec.bv_page; + struct page *page = sio->bvec[0].bv_page; if (ret != 0 && ret != PAGE_SIZE) { /* @@ -344,10 +345,10 @@ static int swap_writepage_fs(struct page *page, struct writeback_control *wbc) init_sync_kiocb(&sio->iocb, swap_file); sio->iocb.ki_complete = sio_write_complete; sio->iocb.ki_pos = page_file_offset(page); - sio->bvec.bv_page = page; - sio->bvec.bv_len = PAGE_SIZE; - sio->bvec.bv_offset = 0; - iov_iter_bvec(&from, WRITE, &sio->bvec, 1, PAGE_SIZE); + sio->bvec[0].bv_page = page; + sio->bvec[0].bv_len = PAGE_SIZE; + sio->bvec[0].bv_offset = 0; + iov_iter_bvec(&from, WRITE, &sio->bvec[0], 1, PAGE_SIZE); ret = mapping->a_ops->swap_rw(&sio->iocb, &from); if (ret != -EIOCBQUEUED) sio_write_complete(&sio->iocb, ret); @@ -390,46 +391,60 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, static void sio_read_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); - struct page *page = sio->bvec.bv_page; - - if (ret != 0 && ret != PAGE_SIZE) { - SetPageError(page); - ClearPageUptodate(page); - pr_alert_ratelimited("Read-error on swap-device\n"); - } else { - SetPageUptodate(page); - count_vm_event(PSWPIN); + int p; + + for (p = 0; p < sio->pages; p++) { + struct page *page = sio->bvec[p].bv_page; + if (ret != 0 && ret != PAGE_SIZE * sio->pages) { + SetPageError(page); + ClearPageUptodate(page); + pr_alert_ratelimited("Read-error on swap-device\n"); + } else { + SetPageUptodate(page); + count_vm_event(PSWPIN); + } + unlock_page(page); } - unlock_page(page); mempool_free(sio, sio_pool); } -static int swap_readpage_fs(struct page *page) +static void swap_readpage_fs(struct page *page, + struct swap_iocb **plug) { struct swap_info_struct *sis = page_swap_info(page); - struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - struct iov_iter from; - struct swap_iocb *sio; + struct swap_iocb *sio = NULL; loff_t pos = page_file_offset(page); - int ret; - - sio = mempool_alloc(sio_pool, GFP_KERNEL); - init_sync_kiocb(&sio->iocb, swap_file); - sio->iocb.ki_pos = pos; - sio->iocb.ki_complete = sio_read_complete; - sio->bvec.bv_page = page; - sio->bvec.bv_len = PAGE_SIZE; - sio->bvec.bv_offset = 0; - iov_iter_bvec(&from, READ, &sio->bvec, 1, PAGE_SIZE); - ret = mapping->a_ops->swap_rw(&sio->iocb, &from); - if (ret != -EIOCBQUEUED) - sio_read_complete(&sio->iocb, ret); - return ret; + if (*plug) + sio = *plug; + if (sio) { + if (sio->iocb.ki_filp != sis->swap_file || + sio->iocb.ki_pos + sio->pages * PAGE_SIZE != pos) { + swap_read_unplug(sio); + sio = NULL; + } + } + if (!sio) { + sio = mempool_alloc(sio_pool, GFP_KERNEL); + init_sync_kiocb(&sio->iocb, sis->swap_file); + sio->iocb.ki_pos = pos; + sio->iocb.ki_complete = sio_read_complete; + sio->pages = 0; + } + sio->bvec[sio->pages].bv_page = page; + sio->bvec[sio->pages].bv_len = PAGE_SIZE; + sio->bvec[sio->pages].bv_offset = 0; + sio->pages += 1; + if (sio->pages == ARRAY_SIZE(sio->bvec) || !plug) { + swap_read_unplug(sio); + sio = NULL; + } + if (plug) + *plug = sio; } -int swap_readpage(struct page *page, bool synchronous) +int swap_readpage(struct page *page, bool synchronous, + struct swap_iocb **plug) { struct bio *bio; int ret = 0; @@ -455,7 +470,7 @@ int swap_readpage(struct page *page, bool synchronous) } if (data_race(sis->flags & SWP_FS_OPS)) { - ret = swap_readpage_fs(page); + swap_readpage_fs(page, plug); goto out; } @@ -507,3 +522,16 @@ int swap_readpage(struct page *page, bool synchronous) delayacct_swapin_end(); return ret; } + +void __swap_read_unplug(struct swap_iocb *sio) +{ + struct iov_iter from; + struct address_space *mapping = sio->iocb.ki_filp->f_mapping; + int ret; + + iov_iter_bvec(&from, READ, sio->bvec, sio->pages, + PAGE_SIZE * sio->pages); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_read_complete(&sio->iocb, ret); +} diff --git a/mm/swap.h b/mm/swap.h index e8ee995cf8d8..0c79b2478f3f 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -4,7 +4,15 @@ /* linux/mm/page_io.c */ int sio_pool_init(void); -int swap_readpage(struct page *page, bool do_poll); +struct swap_iocb; +int swap_readpage(struct page *page, bool do_poll, + struct swap_iocb **plug); +void __swap_read_unplug(struct swap_iocb *plug); +static inline void swap_read_unplug(struct swap_iocb *plug) +{ + if (unlikely(plug)) + __swap_read_unplug(plug); +} int swap_writepage(struct page *page, struct writeback_control *wbc); void end_swap_bio_write(struct bio *bio); int __swap_writepage(struct page *page, struct writeback_control *wbc, @@ -38,7 +46,8 @@ struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); struct page *read_swap_cache_async(swp_entry_t, gfp_t, struct vm_area_struct *vma, unsigned long addr, - bool do_poll); + bool do_poll, + struct swap_iocb **plug); struct page *__read_swap_cache_async(swp_entry_t, gfp_t, struct vm_area_struct *vma, unsigned long addr, @@ -53,7 +62,8 @@ static inline unsigned int page_swap_flags(struct page *page) return page_swap_info(page)->flags; } #else /* CONFIG_SWAP */ -static inline int swap_readpage(struct page *page, bool do_poll) +static inline int swap_readpage(struct page *page, bool do_poll, + struct swap_iocb **plug); { return 0; } diff --git a/mm/swap_state.c b/mm/swap_state.c index d541594be1c3..5cb2c75fa247 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -520,14 +520,16 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * the swap entry is no longer in use. */ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, - struct vm_area_struct *vma, unsigned long addr, bool do_poll) + struct vm_area_struct *vma, + unsigned long addr, bool do_poll, + struct swap_iocb **plug) { bool page_was_allocated; struct page *retpage = __read_swap_cache_async(entry, gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(retpage, do_poll); + swap_readpage(retpage, do_poll, plug); return retpage; } @@ -621,6 +623,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, unsigned long mask; struct swap_info_struct *si = swp_swap_info(entry); struct blk_plug plug; + struct swap_iocb *splug = NULL; bool do_poll = true, page_allocated; struct vm_area_struct *vma = vmf->vma; unsigned long addr = vmf->address; @@ -647,7 +650,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); + swap_readpage(page, false, &splug); if (offset != entry_offset) { SetPageReadahead(page); count_vm_event(SWAP_RA); @@ -658,8 +661,10 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, lru_add_drain(); /* Push any new pages onto the LRU now */ skip: - page = read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll); + page = read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll, + &splug); blk_finish_plug(&plug); + swap_read_unplug(splug); return page; } @@ -791,6 +796,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, struct vm_fault *vmf) { struct blk_plug plug; + struct swap_iocb *splug = NULL; struct vm_area_struct *vma = vmf->vma; struct page *page; pte_t *pte, pentry; @@ -821,7 +827,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); + swap_readpage(page, false, &splug); if (i != ra_info.offset) { SetPageReadahead(page); count_vm_event(SWAP_RA); @@ -832,8 +838,9 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, lru_add_drain(); skip: page = read_swap_cache_async(fentry, gfp_mask, vma, vmf->address, - ra_info.win == 1); + ra_info.win == 1, &splug); blk_finish_plug(&plug); + swap_read_unplug(splug); return page; } From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28130C433F5 for ; Mon, 24 Jan 2022 03:52:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231803AbiAXDwj (ORCPT ); Sun, 23 Jan 2022 22:52:39 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:46964 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241192AbiAXDwj (ORCPT ); Sun, 23 Jan 2022 22:52:39 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 0CEDE1F3B1; Mon, 24 Jan 2022 03:52:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996358; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ilA7OZh9lH4lfKt0EXiI3v3DOr+k3/ZmJc6w2w8IIto=; b=TUDcaGrH3IQ4jbI1LjGtfJCbku6HgfJ1SA/e6tbwpuxhP2zeP7roeBtqcf29FStJv3t1kA JOxNBGux+SkxQBPCG4okM7MhzqvdeTyyO4puxMZp2QQgWOQETuLMe+PB8FmMINrXEIGO5T vod7VaeEvCsdi4n6uSwxs8g5GqZUM74= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996358; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ilA7OZh9lH4lfKt0EXiI3v3DOr+k3/ZmJc6w2w8IIto=; b=nhz08CByEzBnGB2LFv8iQj+QHte3XP0ue/V1xUBzjFDMFI8LovcXTXc5TKw/QFIcbkrbAf mEcxnij1qU2BYLAQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B5DED13305; Mon, 24 Jan 2022 03:52:34 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id UruJHIIi7mFkRQAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:52:34 +0000 Subject: [PATCH 10/23] MM: submit multipage write for SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611279.26253.12350012848236496937.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org swap_writepage() is given one page at a time, but may be called repeatedly in succession. For block-device swapspace, the blk_plug functionality allows the multiple pages to be combined together at lower layers. That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page reads. With this patch we pass a pointer-to-pointer via the wbc. swap_writepage can store state between calls - much like the pointer passed explicitly to swap_readpage. After calling swap_writepage() some number of times, the state will be passed to swap_write_unplug() which can submit the combined request. Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig Reported-by: kernel test robot --- include/linux/writeback.h | 7 +++ mm/page_io.c | 103 +++++++++++++++++++++++++++++---------------- mm/swap.h | 1 mm/vmscan.c | 9 +++- 4 files changed, 82 insertions(+), 38 deletions(-) diff --git a/include/linux/writeback.h b/include/linux/writeback.h index fec248ab1fec..6dcaa0639c0d 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -80,6 +80,13 @@ struct writeback_control { unsigned punt_to_cgroup:1; /* cgrp punting, see __REQ_CGROUP_PUNT */ + /* To enable batching of swap writes to non-block-device backends, + * "plug" can be set point to a 'struct swap_iocb *'. When all swap + * writes have been submitted, if with swap_iocb is not NULL, + * swap_write_unplug() should be called. + */ + struct swap_iocb **plug; + #ifdef CONFIG_CGROUP_WRITEBACK struct bdi_writeback *wb; /* wb this writeback is issued under */ struct inode *inode; /* inode being written out */ diff --git a/mm/page_io.c b/mm/page_io.c index bcf655d650c8..b61c2cafc4f9 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -307,56 +307,74 @@ int sio_pool_init(void) static void sio_write_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); - struct page *page = sio->bvec[0].bv_page; + int p; - if (ret != 0 && ret != PAGE_SIZE) { - /* - * In the case of swap-over-nfs, this can be a - * temporary failure if the system has limited - * memory for allocating transmit buffers. - * Mark the page dirty and avoid - * folio_rotate_reclaimable but rate-limit the - * messages but do not flag PageError like - * the normal direct-to-bio case as it could - * be temporary. - */ - set_page_dirty(page); - ClearPageReclaim(page); - pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n", - ret, page_file_offset(page)); - } else - count_vm_event(PSWPOUT); - end_page_writeback(page); + for (p = 0; p < sio->pages; p++) { + struct page *page = sio->bvec[p].bv_page; + + if (ret != 0 && ret != PAGE_SIZE * sio->pages) { + /* + * In the case of swap-over-nfs, this can be a + * temporary failure if the system has limited + * memory for allocating transmit buffers. + * Mark the page dirty and avoid + * folio_rotate_reclaimable but rate-limit the + * messages but do not flag PageError like + * the normal direct-to-bio case as it could + * be temporary. + */ + set_page_dirty(page); + ClearPageReclaim(page); + pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n", + ret, page_file_offset(page)); + } else + count_vm_event(PSWPOUT); + end_page_writeback(page); + } mempool_free(sio, sio_pool); } static int swap_writepage_fs(struct page *page, struct writeback_control *wbc) { - struct swap_iocb *sio; + struct swap_iocb *sio = NULL; struct swap_info_struct *sis = page_swap_info(page); struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - struct iov_iter from; - int ret; + loff_t pos = page_file_offset(page); set_page_writeback(page); unlock_page(page); - sio = mempool_alloc(sio_pool, GFP_NOIO); - init_sync_kiocb(&sio->iocb, swap_file); - sio->iocb.ki_complete = sio_write_complete; - sio->iocb.ki_pos = page_file_offset(page); - sio->bvec[0].bv_page = page; - sio->bvec[0].bv_len = PAGE_SIZE; - sio->bvec[0].bv_offset = 0; - iov_iter_bvec(&from, WRITE, &sio->bvec[0], 1, PAGE_SIZE); - ret = mapping->a_ops->swap_rw(&sio->iocb, &from); - if (ret != -EIOCBQUEUED) - sio_write_complete(&sio->iocb, ret); - return ret; + if (wbc->plug) + sio = *wbc->plug; + if (sio) { + if (sio->iocb.ki_filp != swap_file || + sio->iocb.ki_pos + sio->pages * PAGE_SIZE != pos) { + swap_write_unplug(sio); + sio = NULL; + } + } + if (!sio) { + sio = mempool_alloc(sio_pool, GFP_NOIO); + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_complete = sio_write_complete; + sio->iocb.ki_pos = pos; + sio->pages = 0; + } + sio->bvec[sio->pages].bv_page = page; + sio->bvec[sio->pages].bv_len = PAGE_SIZE; + sio->bvec[sio->pages].bv_offset = 0; + sio->pages += 1; + if (sio->pages == ARRAY_SIZE(sio->bvec) || !wbc->plug) { + swap_write_unplug(sio); + sio = NULL; + } + if (wbc->plug) + *wbc->plug = sio; + + return 0; } int __swap_writepage(struct page *page, struct writeback_control *wbc, - bio_end_io_t end_write_func) + bio_end_io_t end_write_func) { struct bio *bio; int ret; @@ -388,6 +406,19 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return 0; } +void swap_write_unplug(struct swap_iocb *sio) +{ + struct iov_iter from; + struct address_space *mapping = sio->iocb.ki_filp->f_mapping; + int ret; + + iov_iter_bvec(&from, WRITE, sio->bvec, sio->pages, + PAGE_SIZE * sio->pages); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_write_complete(&sio->iocb, ret); +} + static void sio_read_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); diff --git a/mm/swap.h b/mm/swap.h index 0c79b2478f3f..0194ac153d40 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -13,6 +13,7 @@ static inline void swap_read_unplug(struct swap_iocb *plug) if (unlikely(plug)) __swap_read_unplug(plug); } +void swap_write_unplug(struct swap_iocb *sio); int swap_writepage(struct page *page, struct writeback_control *wbc); void end_swap_bio_write(struct bio *bio); int __swap_writepage(struct page *page, struct writeback_control *wbc, diff --git a/mm/vmscan.c b/mm/vmscan.c index ad5026d06aa8..f75c71490921 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1164,7 +1164,8 @@ typedef enum { * pageout is called by shrink_page_list() for each dirty page. * Calls ->writepage(). */ -static pageout_t pageout(struct page *page, struct address_space *mapping) +static pageout_t pageout(struct page *page, struct address_space *mapping, + struct swap_iocb **plug) { /* * If the page is dirty, only perform writeback if that write @@ -1211,6 +1212,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping) .range_start = 0, .range_end = LLONG_MAX, .for_reclaim = 1, + .plug = plug, }; SetPageReclaim(page); @@ -1537,6 +1539,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, unsigned int nr_reclaimed = 0; unsigned int pgactivate = 0; bool do_demote_pass; + struct swap_iocb *plug = NULL; memset(stat, 0, sizeof(*stat)); cond_resched(); @@ -1817,7 +1820,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, * starts and then write it out here. */ try_to_unmap_flush_dirty(); - switch (pageout(page, mapping)) { + switch (pageout(page, mapping, &plug)) { case PAGE_KEEP: goto keep_locked; case PAGE_ACTIVATE: @@ -1971,6 +1974,8 @@ static unsigned int shrink_page_list(struct list_head *page_list, list_splice(&ret_pages, page_list); count_vm_events(PGACTIVATE, pgactivate); + if (plug) + swap_write_unplug(plug); return nr_reclaimed; } From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721488 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 870A5C433FE for ; Mon, 24 Jan 2022 03:52:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241192AbiAXDwx (ORCPT ); Sun, 23 Jan 2022 22:52:53 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:56842 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241200AbiAXDwx (ORCPT ); Sun, 23 Jan 2022 22:52:53 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id EE71321997; Mon, 24 Jan 2022 03:52:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996371; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ceo+7tr0pXIBwAY1V2k9H3/l1Y7n6GRK1pkwlK65lYo=; b=lLj4nf8igFAfu4Ik+74JXEmmTkQOHU4byW570y8EKr/iGTF1XOx9dFTRJvtFTF6XgWUduL 2NwVIJGOx6Jt3sfVwWSd4zScsuc7XrTD5boEV1xk1LPkO9FMpCAE7u/gcFVMSrpmPsAg6d wMplTJN7lpMlw7j42XnSn18GcAp/K/8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996371; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ceo+7tr0pXIBwAY1V2k9H3/l1Y7n6GRK1pkwlK65lYo=; b=4nsYAxNuhOIH2b+do3G/Wsh7rlmS9fAYTdnoiNsVKN95nUj/Ybjbxv+UeR5ZDCbDy04sjM UTFqN0IZE3Kv8YDw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 14D3913305; Mon, 24 Jan 2022 03:52:48 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 5qpHMZAi7mF4RQAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:52:48 +0000 Subject: [PATCH 11/23] VFS: Add FMODE_CAN_ODIRECT file flag From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611280.26253.2845018521780218144.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Currently various places test if direct IO is possible on a file by checking for the existence of the direct_IO address space operation. This is a poor choice, as the direct_IO operation may not be used - it is only used if the generic_file_*_iter functions are called for direct IO and some filesystems - particularly NFS - don't do this. Instead, introduce a new f_mode flag: FMODE_CAN_ODIRECT and change the various places to check this (avoiding pointer dereferences). do_dentry_open() will set this flag if ->direct_IO is present, so filesystems do not need to be changed. NFS *is* changed, to set the flag explicitly and discard the direct_IO entry in the address_space_operations for files. Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig --- drivers/block/loop.c | 4 ++-- fs/fcntl.c | 9 ++++----- fs/nfs/file.c | 3 ++- fs/open.c | 9 ++++----- fs/overlayfs/file.c | 13 ++++--------- include/linux/fs.h | 3 +++ 6 files changed, 19 insertions(+), 22 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 01cbbfc4e9e2..a2609dd79370 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -184,8 +184,8 @@ static void __loop_update_dio(struct loop_device *lo, bool dio) */ if (dio) { if (queue_logical_block_size(lo->lo_queue) >= sb_bsize && - !(lo->lo_offset & dio_align) && - mapping->a_ops->direct_IO) + !(lo->lo_offset & dio_align) && + (file->f_mode & FMODE_CAN_ODIRECT)) use_dio = true; else use_dio = false; diff --git a/fs/fcntl.c b/fs/fcntl.c index 9c6c6a3e2de5..11e665242a76 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -56,11 +56,10 @@ static int setfl(int fd, struct file * filp, unsigned long arg) arg |= O_NONBLOCK; /* Pipe packetized mode is controlled by O_DIRECT flag */ - if (!S_ISFIFO(inode->i_mode) && (arg & O_DIRECT)) { - if (!filp->f_mapping || !filp->f_mapping->a_ops || - !filp->f_mapping->a_ops->direct_IO) - return -EINVAL; - } + if (!S_ISFIFO(inode->i_mode) && + (arg & O_DIRECT) && + !(filp->f_mode & FMODE_CAN_ODIRECT)) + return -EINVAL; if (filp->f_op->check_flags) error = filp->f_op->check_flags(arg); diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 3dbef2c31567..9e2def045111 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -74,6 +74,8 @@ nfs_file_open(struct inode *inode, struct file *filp) return res; res = nfs_open(inode, filp); + if (res == 0) + filp->f_mode |= FMODE_CAN_ODIRECT; return res; } @@ -535,7 +537,6 @@ const struct address_space_operations nfs_file_aops = { .write_end = nfs_write_end, .invalidatepage = nfs_invalidate_page, .releasepage = nfs_release_page, - .direct_IO = nfs_direct_IO, #ifdef CONFIG_MIGRATION .migratepage = nfs_migrate_page, #endif diff --git a/fs/open.c b/fs/open.c index 9ff2f621b760..76ddf9014499 100644 --- a/fs/open.c +++ b/fs/open.c @@ -834,17 +834,16 @@ static int do_dentry_open(struct file *f, if ((f->f_mode & FMODE_WRITE) && likely(f->f_op->write || f->f_op->write_iter)) f->f_mode |= FMODE_CAN_WRITE; + if (f->f_mapping->a_ops && f->f_mapping->a_ops->direct_IO) + f->f_mode |= FMODE_CAN_ODIRECT; f->f_write_hint = WRITE_LIFE_NOT_SET; f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC); file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping); - /* NB: we're sure to have correct a_ops only after f_op->open */ - if (f->f_flags & O_DIRECT) { - if (!f->f_mapping->a_ops || !f->f_mapping->a_ops->direct_IO) - return -EINVAL; - } + if ((f->f_flags & O_DIRECT) && !(f->f_mode & FMODE_CAN_ODIRECT)) + return -EINVAL; /* * XXX: Huge page cache doesn't support writing yet. Drop all page diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index fa125feed0ff..9d69b4dbb8c4 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -82,11 +82,8 @@ static int ovl_change_flags(struct file *file, unsigned int flags) if (((flags ^ file->f_flags) & O_APPEND) && IS_APPEND(inode)) return -EPERM; - if (flags & O_DIRECT) { - if (!file->f_mapping->a_ops || - !file->f_mapping->a_ops->direct_IO) - return -EINVAL; - } + if ((flags & O_DIRECT) && !(file->f_mode & FMODE_CAN_ODIRECT)) + return -EINVAL; if (file->f_op->check_flags) { err = file->f_op->check_flags(flags); @@ -306,8 +303,7 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter) ret = -EINVAL; if (iocb->ki_flags & IOCB_DIRECT && - (!real.file->f_mapping->a_ops || - !real.file->f_mapping->a_ops->direct_IO)) + !(real.file->f_mode & FMODE_CAN_ODIRECT)) goto out_fdput; old_cred = ovl_override_creds(file_inode(file)->i_sb); @@ -367,8 +363,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) ret = -EINVAL; if (iocb->ki_flags & IOCB_DIRECT && - (!real.file->f_mapping->a_ops || - !real.file->f_mapping->a_ops->direct_IO)) + !(real.file->f_mode & FMODE_CAN_ODIRECT)) goto out_fdput; if (!ovl_should_sync(OVL_FS(inode->i_sb))) diff --git a/include/linux/fs.h b/include/linux/fs.h index 4fade3b20c87..48c021cbe327 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -161,6 +161,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* File is stream-like */ #define FMODE_STREAM ((__force fmode_t)0x200000) +/* File supports DIRECT IO */ +#define FMODE_CAN_ODIRECT ((__force fmode_t)0x400000) + /* File was opened by fanotify and shouldn't generate fanotify events */ #define FMODE_NONOTIFY ((__force fmode_t)0x4000000) From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721489 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 315D5C433FE for ; Mon, 24 Jan 2022 03:53:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241204AbiAXDxK (ORCPT ); Sun, 23 Jan 2022 22:53:10 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:56866 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241200AbiAXDxK (ORCPT ); Sun, 23 Jan 2022 22:53:10 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E11EA21997; Mon, 24 Jan 2022 03:53:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996388; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=w8lR7J8JNn9z3HgJi0/q20xbPjyDJKVj8xh3UiEzhOQ=; b=BIFtq0WxNoK33RbSSnkbhn7EkwZIJe9uuHRxyCZ03NZ7aYWSjcVp24hiAUh1n0iZ2FW66L KiFctLtDEFU4K29rI7mnmXznzvDMex258aT9xsPPVtAk0JwooEtTFgRa2cnmEDBVQaOpA0 MSFdLM9c+p5eGfwo5yPWmePVfpPljq0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996388; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=w8lR7J8JNn9z3HgJi0/q20xbPjyDJKVj8xh3UiEzhOQ=; b=T7wT1d1qNNR2PFPLlpZBeDaBKmfxRWNTJgT+STJswMATGxn3qssrMiMMOHHdaw5UYROrSd 59Joc+ibKdu2eGBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D33C113305; Mon, 24 Jan 2022 03:53:05 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id XkU8I6Ei7mGURQAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:53:05 +0000 Subject: [PATCH 12/23] NFS: remove IS_SWAPFILE hack From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611280.26253.6924680050876339981.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org This code is pointless as IS_SWAPFILE is always defined. So remove it. Suggested-by: Mark Hemment Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig --- fs/nfs/file.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 9e2def045111..4d4750738aeb 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -44,11 +44,6 @@ static const struct vm_operations_struct nfs_file_vm_ops; -/* Hack for future NFS swap support */ -#ifndef IS_SWAPFILE -# define IS_SWAPFILE(inode) (0) -#endif - int nfs_check_flags(int flags) { if ((flags & (O_APPEND | O_DIRECT)) == (O_APPEND | O_DIRECT)) From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721504 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62602C433EF for ; Mon, 24 Jan 2022 03:53:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241208AbiAXDx0 (ORCPT ); Sun, 23 Jan 2022 22:53:26 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:56942 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241200AbiAXDx0 (ORCPT ); Sun, 23 Jan 2022 22:53:26 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 53FD221997; Mon, 24 Jan 2022 03:53:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996405; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0qDluxkEK7zAySAVPGR4jtj1uxF720K50iXEZFn4nZw=; b=eRwoWwRXnjPsBO3djnZMjJjfXy7qs9WlcJ/U05UJ1Htu0k9+tgoiUkWgFyIO6B29ShV8Yc szunxTaUFtRfKqDvTWQ1CP0bwDLnhzYjrT2DHHgKivyIQXwYAw+bq1VYPIavB6tidNPxQL Q6Kgu0EqJEJ3nwRSiUDhQ6OnJfKo13c= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996405; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0qDluxkEK7zAySAVPGR4jtj1uxF720K50iXEZFn4nZw=; b=H1ZRzb7v9Zn/6SXSWIyscE0WL4iA877N13xXsuRAChNlPO7SX2B7wqhNzPSC5fHJHx4tGG 0dT3+we+sNrn/2Bw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 5BDE613305; Mon, 24 Jan 2022 03:53:20 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id xu0DBrAi7mGsRQAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:53:20 +0000 Subject: [PATCH 13/23] NFS: rename nfs_direct_IO and use as ->swap_rw From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611281.26253.6497855219394305186.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org The nfs_direct_IO() exists to support SWAP IO, but hasn't worked for a while. We now need a ->swap_rw function which behaves slightly differently, returning zero for success rather than a byte count. So modify nfs_direct_IO accordingly, rename it, and use it as the ->swap_rw function. Note: it still won't work - that will be fixed in later patches. Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig --- fs/nfs/direct.c | 23 ++++++++++------------- fs/nfs/file.c | 5 +---- include/linux/nfs_fs.h | 2 +- 3 files changed, 12 insertions(+), 18 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index eabfdab543c8..b929dd5b0c3a 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -153,28 +153,25 @@ nfs_direct_count_bytes(struct nfs_direct_req *dreq, } /** - * nfs_direct_IO - NFS address space operation for direct I/O + * nfs_swap_rw - NFS address space operation for swap I/O * @iocb: target I/O control block * @iter: I/O buffer * - * The presence of this routine in the address space ops vector means - * the NFS client supports direct I/O. However, for most direct IO, we - * shunt off direct read and write requests before the VFS gets them, - * so this method is only ever called for swap. + * Perform IO to the swap-file. This is much like direct IO. */ -ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) +int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter) { - struct inode *inode = iocb->ki_filp->f_mapping->host; - - /* we only support swap file calling nfs_direct_IO */ - if (!IS_SWAPFILE(inode)) - return 0; + ssize_t ret; VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE); if (iov_iter_rw(iter) == READ) - return nfs_file_direct_read(iocb, iter); - return nfs_file_direct_write(iocb, iter); + ret = nfs_file_direct_read(iocb, iter); + else + ret = nfs_file_direct_write(iocb, iter); + if (ret < 0) + return ret; + return 0; } static void nfs_direct_release_pages(struct page **pages, unsigned int npages) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 4d4750738aeb..91ff9ed05b06 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -489,10 +489,6 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); struct inode *inode = file->f_mapping->host; - if (!file->f_mapping->a_ops->swap_rw) - /* Cannot support swap */ - return -EINVAL; - spin_lock(&inode->i_lock); blocks = inode->i_blocks; isize = inode->i_size; @@ -540,6 +536,7 @@ const struct address_space_operations nfs_file_aops = { .error_remove_page = generic_error_remove_page, .swap_activate = nfs_swap_activate, .swap_deactivate = nfs_swap_deactivate, + .swap_rw = nfs_swap_rw, }; /* diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 00835bacd236..29a5e579f26f 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -509,7 +509,7 @@ static inline const struct cred *nfs_file_cred(struct file *file) /* * linux/fs/nfs/direct.c */ -extern ssize_t nfs_direct_IO(struct kiocb *, struct iov_iter *); +extern int nfs_swap_rw(struct kiocb *, struct iov_iter *); extern ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter); extern ssize_t nfs_file_direct_write(struct kiocb *iocb, From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721505 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92285C433FE for ; Mon, 24 Jan 2022 03:53:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241200AbiAXDxg (ORCPT ); Sun, 23 Jan 2022 22:53:36 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:56970 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241222AbiAXDxg (ORCPT ); Sun, 23 Jan 2022 22:53:36 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 264F721997; Mon, 24 Jan 2022 03:53:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996415; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x65zmmwMPjQQ4i8377MUZFK2DfVNQk3gdsKgnDdhJOY=; b=VeD+Dk1LlTNVRdYGmbcLCv0OBI0ELzcGtVOsx0LIjbyDi03Cjm9yXGwJ6FTiAGOd4RLtI4 mLcsr+9eexpa30sTypMVUyZmtuYBQSi+Y2IOseEc7tg/BSPhlAVTgcdh1ICdThARjc5mIu v4s0Y2iOY1w9iI8kjTfffqBN0y151+k= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996415; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x65zmmwMPjQQ4i8377MUZFK2DfVNQk3gdsKgnDdhJOY=; b=874/rP8Dzx/B+xlfhQzuKKwZ1zuCwD/Niwj0KCkxxJc144dr93O+VhDoe4pyId5LWaTLcg tR6rwsuRD+WQjiAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2B1B713305; Mon, 24 Jan 2022 03:53:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id a1wwNrsi7mG5RQAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:53:31 +0000 Subject: [PATCH 14/23] NFS: swap IO handling is slightly different for O_DIRECT IO From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611281.26253.15560926531007295753.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org 1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted. We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw() 2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw(). Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig --- fs/nfs/direct.c | 30 +++++++++++++++++++++--------- fs/nfs/file.c | 4 ++-- include/linux/nfs_fs.h | 4 ++-- 3 files changed, 25 insertions(+), 13 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index b929dd5b0c3a..43a956d7fd62 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -166,9 +166,9 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter) VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE); if (iov_iter_rw(iter) == READ) - ret = nfs_file_direct_read(iocb, iter); + ret = nfs_file_direct_read(iocb, iter, true); else - ret = nfs_file_direct_write(iocb, iter); + ret = nfs_file_direct_write(iocb, iter, true); if (ret < 0) return ret; return 0; @@ -422,6 +422,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, * nfs_file_direct_read - file direct read operation for NFS files * @iocb: target I/O control block * @iter: vector of user buffers into which to read data + * @swap: flag indicating this is swap IO, not O_DIRECT IO * * We use this function for direct reads instead of calling * generic_file_aio_read() in order to avoid gfar's check to see if @@ -437,7 +438,8 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, * client must read the updated atime from the server back into its * cache. */ -ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) +ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter, + bool swap) { struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; @@ -479,12 +481,14 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) if (iter_is_iovec(iter)) dreq->flags = NFS_ODIRECT_SHOULD_DIRTY; - nfs_start_io_direct(inode); + if (!swap) + nfs_start_io_direct(inode); NFS_I(inode)->read_io += count; requested = nfs_direct_read_schedule_iovec(dreq, iter, iocb->ki_pos); - nfs_end_io_direct(inode); + if (!swap) + nfs_end_io_direct(inode); if (requested > 0) { result = nfs_direct_wait(dreq); @@ -873,6 +877,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * nfs_file_direct_write - file direct write operation for NFS files * @iocb: target I/O control block * @iter: vector of user buffers from which to write data + * @swap: flag indicating this is swap IO, not O_DIRECT IO * * We use this function for direct writes instead of calling * generic_file_aio_write() in order to avoid taking the inode @@ -889,7 +894,8 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * Note that O_APPEND is not supported for NFS direct writes, as there * is no atomic O_APPEND write facility in the NFS protocol. */ -ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) +ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, + bool swap) { ssize_t result, requested; size_t count; @@ -903,7 +909,11 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n", file, iov_iter_count(iter), (long long) iocb->ki_pos); - result = generic_write_checks(iocb, iter); + if (!swap) + result = generic_write_checks(iocb, iter); + else + /* bypass generic checks */ + result = iov_iter_count(iter); if (result <= 0) return result; count = result; @@ -934,7 +944,8 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dreq->iocb = iocb; pnfs_init_ds_commit_info_ops(&dreq->ds_cinfo, inode); - nfs_start_io_direct(inode); + if (!swap) + nfs_start_io_direct(inode); requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); @@ -943,7 +954,8 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) pos >> PAGE_SHIFT, end); } - nfs_end_io_direct(inode); + if (!swap) + nfs_end_io_direct(inode); if (requested > 0) { result = nfs_direct_wait(dreq); diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 91ff9ed05b06..04ba56f223d3 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -159,7 +159,7 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to) ssize_t result; if (iocb->ki_flags & IOCB_DIRECT) - return nfs_file_direct_read(iocb, to); + return nfs_file_direct_read(iocb, to, false); dprintk("NFS: read(%pD2, %zu@%lu)\n", iocb->ki_filp, @@ -625,7 +625,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) return result; if (iocb->ki_flags & IOCB_DIRECT) - return nfs_file_direct_write(iocb, from); + return nfs_file_direct_write(iocb, from, false); dprintk("NFS: write(%pD2, %zu@%Ld)\n", file, iov_iter_count(from), (long long) iocb->ki_pos); diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 29a5e579f26f..aba38dc4fd29 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -511,9 +511,9 @@ static inline const struct cred *nfs_file_cred(struct file *file) */ extern int nfs_swap_rw(struct kiocb *, struct iov_iter *); extern ssize_t nfs_file_direct_read(struct kiocb *iocb, - struct iov_iter *iter); + struct iov_iter *iter, bool swap); extern ssize_t nfs_file_direct_write(struct kiocb *iocb, - struct iov_iter *iter); + struct iov_iter *iter, bool swap); /* * linux/fs/nfs/dir.c From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721506 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1B11C433EF for ; Mon, 24 Jan 2022 03:53:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241223AbiAXDxx (ORCPT ); Sun, 23 Jan 2022 22:53:53 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:47044 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241222AbiAXDxv (ORCPT ); Sun, 23 Jan 2022 22:53:51 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 6876A1F3B1; Mon, 24 Jan 2022 03:53:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996430; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uJLgBSr+4ac2HtnFPf6KLC0wSgPyRK3wpQcub+LxsjE=; b=L+FUcoLPZYljCcLvX9K/q773EowcGOY8gOh2JXFdbx114stJhLzsplomQo6lS03JO1xwlC OavO7aodABOj7DM1zEtaS/Ps5dJk9Ssg9PCkZwiOH5l/8hLzNgEAcW9R8+SJWWbmvSdRn8 k3eQL2WEaG21D23do3D5THtg8zaCB24= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996430; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uJLgBSr+4ac2HtnFPf6KLC0wSgPyRK3wpQcub+LxsjE=; b=GFxtFEBVjqgFB2/OrAAU487VJVhuxNXgm3YLtOlyWOoqsbllVn80PezNyfHwKM+Br1Tv3T F0v5MecEjjKEBUDg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 07CFF13305; Mon, 24 Jan 2022 03:53:46 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id m2I4Lcoi7mHGRQAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:53:46 +0000 Subject: [PATCH 15/23] SUNRPC/call_alloc: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611282.26253.11804975093411638223.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. rpc_malloc() can block, and this might cause deadlocks. So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if blocking is acceptable. Signed-off-by: NeilBrown --- net/sunrpc/sched.c | 4 +++- net/sunrpc/xprtrdma/transport.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index e2c835482791..d5b6e897f5a5 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -1023,8 +1023,10 @@ int rpc_malloc(struct rpc_task *task) struct rpc_buffer *buf; gfp_t gfp = GFP_NOFS; + if (RPC_IS_ASYNC(task)) + gfp = GFP_NOWAIT | __GFP_NOWARN; if (RPC_IS_SWAPPER(task)) - gfp = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN; + gfp |= __GFP_MEMALLOC; size += sizeof(struct rpc_buffer); if (size <= RPC_BUFFER_MAXSIZE) diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 16e5696314a4..a52277115500 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -574,8 +574,10 @@ xprt_rdma_allocate(struct rpc_task *task) gfp_t flags; flags = RPCRDMA_DEF_GFP; + if (RPC_IS_ASYNC(task)) + flags = GFP_NOWAIT | __GFP_NOWARN; if (RPC_IS_SWAPPER(task)) - flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN; + flags |= __GFP_MEMALLOC; if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize, flags)) From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721507 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 505F1C433FE for ; Mon, 24 Jan 2022 03:54:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241241AbiAXDyE (ORCPT ); Sun, 23 Jan 2022 22:54:04 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:47058 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241240AbiAXDyD (ORCPT ); Sun, 23 Jan 2022 22:54:03 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 5F8221F3B1; Mon, 24 Jan 2022 03:54:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996442; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/Xf0wQNVm3xiMoEph9xuiX0w4MC4SZqJ4flfzelN2po=; b=emgzZtjciaKBzXok+isTqdewPHWN8QVhen+0ytn7jFeHei00IfPUvjy2BPRi6mEcVPbUrg LXA3vTnrwxGAgVWrpvxuWeZvI5lKlC7rDcIl+GQ7PDhbFYD26jjibZyD3S0Lp08+c56j/I /yy/YhrtQMN9noix8Byt2vBiidKJg0I= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996442; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/Xf0wQNVm3xiMoEph9xuiX0w4MC4SZqJ4flfzelN2po=; b=GWySUa2k8w9hiuYeo9wTlAVCgAxekBzAAMvelpOsSQucqbJT/OhbX0h7TTSHk95eX3Xwty tzzL4VYl83wZLWAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 7575913305; Mon, 24 Jan 2022 03:53:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id FjM9Ddci7mHNRQAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:53:59 +0000 Subject: [PATCH 16/23] SUNRPC/auth: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611283.26253.4389271361333923379.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. lookup_cred() can block on a mempool or kmalloc - and this can cause deadlocks. So add a new RPCAUTH_LOOKUP flag for async lookups and don't block on memory. If the -ENOMEM gets back to call_refreshresult(), wait a short while and try again. HZ>>4 is chosen as it is used elsewhere for -ENOMEM retries. Signed-off-by: NeilBrown --- include/linux/sunrpc/auth.h | 1 + net/sunrpc/auth.c | 6 +++++- net/sunrpc/auth_gss/auth_gss.c | 6 +++++- net/sunrpc/auth_unix.c | 10 ++++++++-- net/sunrpc/clnt.c | 3 +++ 5 files changed, 22 insertions(+), 4 deletions(-) diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h index 98da816b5fc2..3e6ce288a7fc 100644 --- a/include/linux/sunrpc/auth.h +++ b/include/linux/sunrpc/auth.h @@ -99,6 +99,7 @@ struct rpc_auth_create_args { /* Flags for rpcauth_lookupcred() */ #define RPCAUTH_LOOKUP_NEW 0x01 /* Accept an uninitialised cred */ +#define RPCAUTH_LOOKUP_ASYNC 0x02 /* Don't block waiting for memory */ /* * Client authentication ops diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index a9f0d17fdb0d..6bfa19f9fa6a 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -615,6 +615,8 @@ rpcauth_bind_root_cred(struct rpc_task *task, int lookupflags) }; struct rpc_cred *ret; + if (RPC_IS_ASYNC(task)) + lookupflags |= RPCAUTH_LOOKUP_ASYNC; ret = auth->au_ops->lookup_cred(auth, &acred, lookupflags); put_cred(acred.cred); return ret; @@ -631,6 +633,8 @@ rpcauth_bind_machine_cred(struct rpc_task *task, int lookupflags) if (!acred.principal) return NULL; + if (RPC_IS_ASYNC(task)) + lookupflags |= RPCAUTH_LOOKUP_ASYNC; return auth->au_ops->lookup_cred(auth, &acred, lookupflags); } @@ -654,7 +658,7 @@ rpcauth_bindcred(struct rpc_task *task, const struct cred *cred, int flags) }; if (flags & RPC_TASK_ASYNC) - lookupflags |= RPCAUTH_LOOKUP_NEW; + lookupflags |= RPCAUTH_LOOKUP_NEW | RPCAUTH_LOOKUP_ASYNC; if (task->tk_op_cred) /* Task must use exactly this rpc_cred */ new = get_rpccred(task->tk_op_cred); diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index 5f42aa5fc612..df72d6301f78 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -1341,7 +1341,11 @@ gss_hash_cred(struct auth_cred *acred, unsigned int hashbits) static struct rpc_cred * gss_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) { - return rpcauth_lookup_credcache(auth, acred, flags, GFP_NOFS); + gfp_t gfp = GFP_NOFS; + + if (flags & RPCAUTH_LOOKUP_ASYNC) + gfp = GFP_NOWAIT | __GFP_NOWARN; + return rpcauth_lookup_credcache(auth, acred, flags, gfp); } static struct rpc_cred * diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c index e7df1f782b2e..e5819265dd1b 100644 --- a/net/sunrpc/auth_unix.c +++ b/net/sunrpc/auth_unix.c @@ -43,8 +43,14 @@ unx_destroy(struct rpc_auth *auth) static struct rpc_cred * unx_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) { - struct rpc_cred *ret = mempool_alloc(unix_pool, GFP_NOFS); - + gfp_t gfp = GFP_NOFS; + struct rpc_cred *ret; + + if (flags & RPCAUTH_LOOKUP_ASYNC) + gfp = GFP_NOWAIT | __GFP_NOWARN; + ret = mempool_alloc(unix_pool, gfp); + if (!ret) + return ERR_PTR(-ENOMEM); rpcauth_init_cred(ret, acred, auth, &unix_credops); ret->cr_flags = 1UL << RPCAUTH_CRED_UPTODATE; return ret; diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index a312ea2bc440..238b2ef5491f 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1745,6 +1745,9 @@ call_refreshresult(struct rpc_task *task) task->tk_cred_retry--; trace_rpc_retry_refresh_status(task); return; + case -ENOMEM: + rpc_delay(task, HZ >> 4); + return; } trace_rpc_refresh_status(task); rpc_call_rpcerror(task, status); From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721508 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47D2AC433F5 for ; Mon, 24 Jan 2022 03:54:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235933AbiAXDyO (ORCPT ); Sun, 23 Jan 2022 22:54:14 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:56994 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241230AbiAXDyN (ORCPT ); Sun, 23 Jan 2022 22:54:13 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C88BC21995; Mon, 24 Jan 2022 03:54:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996452; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f9JPZARu0omLin5bRQVSgWDU6wAXTb1AVpCm2PGq0QA=; b=oWLVbYedFQP9v6Y+Vt2dCmYJ7BswiwzErr2bNqohkHzF29J5NtnzRR3hInA0/7Pbsg+a4p dkVN24TPtV/Zna3iFWM09kMIkBk2VhKvphP+cHwNm6UvHMXPUMMbSiNEVjA8OCcDOWDEtr bYJ4pdL9OTrQ+svPLtAoHw7TN/KouOY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996452; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f9JPZARu0omLin5bRQVSgWDU6wAXTb1AVpCm2PGq0QA=; b=d/H/0ZHUojlxTItoes3CwtM1O1/+IIfnm9ut5aMrLo6Gn1mp2AaqBsiqk+9II7iICgBBaQ 0SDOZBWG7iH2FMBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D7F8513305; Mon, 24 Jan 2022 03:54:09 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id qc9UJeEi7mHfRQAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:54:09 +0000 Subject: [PATCH 17/23] SUNRPC/xprt: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611283.26253.16655442929244733353.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. xprt_dynamic_alloc_slot can block indefinitely. This can tie up all workqueue threads and NFS can deadlock. So when called from a workqueue, set __GFP_NORETRY. The rdma alloc_slot already does not block. However it sets the error to -EAGAIN suggesting this will trigger a sleep. It does not. As we can see in call_reserveresult(), only -ENOMEM causes a sleep. -EAGAIN causes immediate retry. Signed-off-by: NeilBrown --- net/sunrpc/xprt.c | 5 ++++- net/sunrpc/xprtrdma/transport.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a02de2bddb28..47d207e416ab 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1687,12 +1687,15 @@ static bool xprt_throttle_congested(struct rpc_xprt *xprt, struct rpc_task *task static struct rpc_rqst *xprt_dynamic_alloc_slot(struct rpc_xprt *xprt) { struct rpc_rqst *req = ERR_PTR(-EAGAIN); + gfp_t gfp_mask = GFP_NOFS; if (xprt->num_reqs >= xprt->max_reqs) goto out; ++xprt->num_reqs; spin_unlock(&xprt->reserve_lock); - req = kzalloc(sizeof(struct rpc_rqst), GFP_NOFS); + if (current->flags & PF_WQ_WORKER) + gfp_mask |= __GFP_NORETRY | __GFP_NOWARN; + req = kzalloc(sizeof(struct rpc_rqst), gfp_mask); spin_lock(&xprt->reserve_lock); if (req != NULL) goto out; diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index a52277115500..32df23796747 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -521,7 +521,7 @@ xprt_rdma_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task) return; out_sleep: - task->tk_status = -EAGAIN; + task->tk_status = -ENOMEM; xprt_add_backlog(xprt, task); } From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE381C433FE for ; Mon, 24 Jan 2022 03:54:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236090AbiAXDyZ (ORCPT ); Sun, 23 Jan 2022 22:54:25 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:57016 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241247AbiAXDyZ (ORCPT ); Sun, 23 Jan 2022 22:54:25 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 3942421995; Mon, 24 Jan 2022 03:54:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996464; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ca/QMA9b8aSP2XqDNpzMoh70WOqFOvfMXBDfQmXLg8g=; b=Sf3ZEov7bEY0Ez4Ijee6zohOpnChzHt39opZtmCl+u+tbAsoA6Wd2YxgU+m3OxtwhEpTDX ENz5icVgcjkjzZ/SgkuhmpvPy1U09x5knsyE/r+rC72EMQgOGaPZ3HMMzlK4ve+0QCNIHc WVnOcMQP3u1asKvXphf4hlwOkbVszv8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996464; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ca/QMA9b8aSP2XqDNpzMoh70WOqFOvfMXBDfQmXLg8g=; b=s+smvE7XBRxcKOPFmcpUiqEVJDGibMMPB5uepmCm8idJ/LfxRgibWPLqRMfs+ZEX0S5Opr mjmr80ZRoQ7gx9Bw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 282FF13305; Mon, 24 Jan 2022 03:54:20 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id CjADNewi7mH6RQAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:54:20 +0000 Subject: [PATCH 18/23] SUNRPC: remove scheduling boost for "SWAPPER" tasks. From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611284.26253.5620153551522452540.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Currently, tasks marked as "swapper" tasks get put to the front of non-priority rpc_queues, and are sorted earlier than non-swapper tasks on the transport's ->xmit_queue. This is pointless as currently *all* tasks for a mount that has swap enabled on *any* file are marked as "swapper" tasks. So the net result is that the non-priority rpc_queues are reverse-ordered (LIFO). This scheduling boost is not necessary to avoid deadlocks, and hurts fairness, so remove it. If there were a need to expedite some requests, the tk_priority mechanism is a more appropriate tool. Signed-off-by: NeilBrown --- net/sunrpc/sched.c | 7 ------- net/sunrpc/xprt.c | 11 ----------- 2 files changed, 18 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index d5b6e897f5a5..256302bf6557 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -186,11 +186,6 @@ static void __rpc_add_wait_queue_priority(struct rpc_wait_queue *queue, /* * Add new request to wait queue. - * - * Swapper tasks always get inserted at the head of the queue. - * This should avoid many nasty memory deadlocks and hopefully - * improve overall performance. - * Everyone else gets appended to the queue to ensure proper FIFO behavior. */ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, struct rpc_task *task, @@ -199,8 +194,6 @@ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, INIT_LIST_HEAD(&task->u.tk_wait.timer_list); if (RPC_IS_PRIORITY(queue)) __rpc_add_wait_queue_priority(queue, task, queue_priority); - else if (RPC_IS_SWAPPER(task)) - list_add(&task->u.tk_wait.list, &queue->tasks[0]); else list_add_tail(&task->u.tk_wait.list, &queue->tasks[0]); task->tk_waitqueue = queue; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 47d207e416ab..a0a2583fe941 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1354,17 +1354,6 @@ xprt_request_enqueue_transmit(struct rpc_task *task) INIT_LIST_HEAD(&req->rq_xmit2); goto out; } - } else if (RPC_IS_SWAPPER(task)) { - list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { - if (pos->rq_cong || pos->rq_bytes_sent) - continue; - if (RPC_IS_SWAPPER(pos->rq_task)) - continue; - /* Note: req is added _before_ pos */ - list_add_tail(&req->rq_xmit, &pos->rq_xmit); - INIT_LIST_HEAD(&req->rq_xmit2); - goto out; - } } else if (!req->rq_seqno) { list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { if (pos->rq_task->tk_owner != task->tk_owner) From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721510 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26D95C433EF for ; Mon, 24 Jan 2022 03:54:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241256AbiAXDyq (ORCPT ); Sun, 23 Jan 2022 22:54:46 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:57042 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241288AbiAXDyn (ORCPT ); Sun, 23 Jan 2022 22:54:43 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id BEDB221995; Mon, 24 Jan 2022 03:54:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996481; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=azmox78FA+UkT5DRfdfUnBYA52daA5AqYmjfYZjaXVU=; b=SvjCElxiZL71HUL1QKmR8N6TjTOkCtoj0nlAhVdRzI8aHkVdWC1FxUJsrJDadWz/jXad/E k0TSIoScbEGeNJNO4y9QqPG5lg887GwF5rgipEqDxURbMGWBMEpy56u9NvpCY+OTs3DyP8 Qcc3iIJ9CAKqgXXgydcQOIRxjh0Dnko= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996481; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=azmox78FA+UkT5DRfdfUnBYA52daA5AqYmjfYZjaXVU=; b=2mur45c8AUVh1bD4Pih8u6DoyIKnnBRd535iZfKCCJaJTEhHv7J+v+vLrzpd//e+KDB5LX XZYsI4dHQYt8/yAQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B204013305; Mon, 24 Jan 2022 03:54:38 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id RDAXG/4i7mERRgAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:54:38 +0000 Subject: [PATCH 19/23] NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611284.26253.4993812368278110635.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org NFS_RPC_SWAPFLAGS is only used for READ requests. It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to requests. This is not needed for swap READ - though it is for writes where it is set via a different mechanism. RPC_TASK_ROOTCREDS causes the 'machine' credential to be used. This is not needed as the root credential is saved when the swap file is opened, and this is used for all IO. So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of RPC_TASK_ROOTCREDS, that isn't needed either. Remove both. Signed-off-by: NeilBrown --- fs/nfs/read.c | 4 ---- include/linux/nfs_fs.h | 5 ----- include/linux/sunrpc/sched.h | 1 - include/trace/events/sunrpc.h | 1 - net/sunrpc/auth.c | 2 +- 5 files changed, 1 insertion(+), 12 deletions(-) diff --git a/fs/nfs/read.c b/fs/nfs/read.c index eb00229c1a50..cd797ce3a67c 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -194,10 +194,6 @@ static void nfs_initiate_read(struct nfs_pgio_header *hdr, const struct nfs_rpc_ops *rpc_ops, struct rpc_task_setup *task_setup_data, int how) { - struct inode *inode = hdr->inode; - int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0; - - task_setup_data->flags |= swap_flags; rpc_ops->read_setup(hdr, msg); trace_nfs_initiate_read(hdr); } diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index aba38dc4fd29..9e87752bdd00 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -45,11 +45,6 @@ */ #define NFS_MAX_TRANSPORTS 16 -/* - * These are the default flags for swap requests - */ -#define NFS_RPC_SWAPFLAGS (RPC_TASK_SWAPPER|RPC_TASK_ROOTCREDS) - /* * Size of the NFS directory verifier */ diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index db964bb63912..56710f8056d3 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -124,7 +124,6 @@ struct rpc_task_setup { #define RPC_TASK_MOVEABLE 0x0004 /* nfs4.1+ rpc tasks */ #define RPC_TASK_NULLCREDS 0x0010 /* Use AUTH_NULL credential */ #define RPC_CALL_MAJORSEEN 0x0020 /* major timeout seen */ -#define RPC_TASK_ROOTCREDS 0x0040 /* force root creds */ #define RPC_TASK_DYNAMIC 0x0080 /* task was kmalloc'ed */ #define RPC_TASK_NO_ROUND_ROBIN 0x0100 /* send requests on "main" xprt */ #define RPC_TASK_SOFT 0x0200 /* Use soft timeouts */ diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 1e566ac4b812..ef9e9351cb2f 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -311,7 +311,6 @@ TRACE_EVENT(rpc_request, { RPC_TASK_MOVEABLE, "MOVEABLE" }, \ { RPC_TASK_NULLCREDS, "NULLCREDS" }, \ { RPC_CALL_MAJORSEEN, "MAJORSEEN" }, \ - { RPC_TASK_ROOTCREDS, "ROOTCREDS" }, \ { RPC_TASK_DYNAMIC, "DYNAMIC" }, \ { RPC_TASK_NO_ROUND_ROBIN, "NO_ROUND_ROBIN" }, \ { RPC_TASK_SOFT, "SOFT" }, \ diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index 6bfa19f9fa6a..682fcd24bf43 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -670,7 +670,7 @@ rpcauth_bindcred(struct rpc_task *task, const struct cred *cred, int flags) /* If machine cred couldn't be bound, try a root cred */ if (new) ; - else if (cred == &machine_cred || (flags & RPC_TASK_ROOTCREDS)) + else if (cred == &machine_cred) new = rpcauth_bind_root_cred(task, lookupflags); else if (flags & RPC_TASK_NULLCREDS) new = authnull_ops.lookup_cred(NULL, NULL, 0); From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721511 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F044C433FE for ; Mon, 24 Jan 2022 03:54:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235618AbiAXDy5 (ORCPT ); Sun, 23 Jan 2022 22:54:57 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:47094 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241247AbiAXDy5 (ORCPT ); Sun, 23 Jan 2022 22:54:57 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 0BD4F1F3A0; Mon, 24 Jan 2022 03:54:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996496; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tmK71EqBNg8tnfZkC+LruaM0kNp6W9oOaGPTMjJ8PB4=; b=uYL7di7uzKVDvKhk4/TIixNyQGpyZdN/fsnIafEtUp6sY/FoSo8FHnhOixkdxzRb0HW07m sh2v9FUMQKDcUD0FPy81aL+sA0NCMbekW8advOn0HAR4KlxXNyzCUmTpflUXZlPEWsqEbS gHoZU9eO37l/5P+JS5hRKh3rd2/7i5I= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996496; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tmK71EqBNg8tnfZkC+LruaM0kNp6W9oOaGPTMjJ8PB4=; b=96z+6d6B8k/Y1+X3FiAseSrAvJKJGFZWOLKawrct75Qw8cq7EliXvbgB0EDoziVTiOweeo 5G5X+hyVgkHKwvCA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id DEDD513305; Mon, 24 Jan 2022 03:54:52 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id y8f0Jgwj7mEgRgAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:54:52 +0000 Subject: [PATCH 20/23] SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611285.26253.480157765187909362.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org rpc tasks can be marked as RPC_TASK_SWAPPER. This causes GFP_MEMALLOC to be used for some allocations. This is needed in some cases, but not in all where it is currently provided, and in some where it isn't provided. Currently *all* tasks associated with a rpc_client on which swap is enabled get the flag and hence some GFP_MEMALLOC support. GFP_MEMALLOC is provided for ->buf_alloc() but only swap-writes need it. However xdr_alloc_bvec does not get GFP_MEMALLOC - though it often does need it. xdr_alloc_bvec is called while the XPRT_LOCK is held. If this blocks, then it blocks all other queued tasks. So this allocation needs GFP_MEMALLOC for *all* requests, not just writes, when the xprt is used for any swap writes. Similarly, if the transport is not connected, that will block all requests including swap writes, so memory allocations should get GFP_MEMALLOC if swap writes are possible. So with this patch: 1/ we ONLY set RPC_TASK_SWAPPER for swap writes. 2/ __rpc_execute() sets PF_MEMALLOC while handling any task with RPC_TASK_SWAPPER set, or when handling any task that holds the XPRT_LOCKED lock on an xprt used for swap. This removes the need for the RPC_IS_SWAPPER() test in ->buf_alloc handlers. 3/ xprt_prepare_transmit() sets PF_MEMALLOC after locking any task to a swapper xprt. __rpc_execute() will clear it. 3/ PF_MEMALLOC is set for all the connect workers. Signed-off-by: NeilBrown --- fs/nfs/write.c | 2 ++ net/sunrpc/clnt.c | 2 -- net/sunrpc/sched.c | 20 +++++++++++++++++--- net/sunrpc/xprt.c | 3 +++ net/sunrpc/xprtrdma/transport.c | 6 ++++-- net/sunrpc/xprtsock.c | 8 ++++++++ 6 files changed, 34 insertions(+), 7 deletions(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 987a187bd39a..9f7176745fef 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1409,6 +1409,8 @@ static void nfs_initiate_write(struct nfs_pgio_header *hdr, { int priority = flush_task_priority(how); + if (IS_SWAPFILE(hdr->inode)) + task_setup_data->flags |= RPC_TASK_SWAPPER; task_setup_data->priority = priority; rpc_ops->write_setup(hdr, msg, &task_setup_data->rpc_client); trace_nfs_initiate_write(hdr); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 238b2ef5491f..cb76fbea3ed5 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1085,8 +1085,6 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt) task->tk_flags |= RPC_TASK_TIMEOUT; if (clnt->cl_noretranstimeo) task->tk_flags |= RPC_TASK_NO_RETRANS_TIMEOUT; - if (atomic_read(&clnt->cl_swapper)) - task->tk_flags |= RPC_TASK_SWAPPER; /* Add to the client's list of all tasks */ spin_lock(&clnt->cl_lock); list_add_tail(&task->tk_task, &clnt->cl_tasks); diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 256302bf6557..9020cedb7c95 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -869,6 +869,15 @@ void rpc_release_calldata(const struct rpc_call_ops *ops, void *calldata) ops->rpc_release(calldata); } +static bool xprt_needs_memalloc(struct rpc_xprt *xprt, struct rpc_task *tk) +{ + if (!xprt) + return false; + if (!atomic_read(&xprt->swapper)) + return false; + return test_bit(XPRT_LOCKED, &xprt->state) && xprt->snd_task == tk; +} + /* * This is the RPC `scheduler' (or rather, the finite state machine). */ @@ -877,6 +886,7 @@ static void __rpc_execute(struct rpc_task *task) struct rpc_wait_queue *queue; int task_is_async = RPC_IS_ASYNC(task); int status = 0; + unsigned long pflags = current->flags; WARN_ON_ONCE(RPC_IS_QUEUED(task)); if (RPC_IS_QUEUED(task)) @@ -899,6 +909,10 @@ static void __rpc_execute(struct rpc_task *task) } if (!do_action) break; + if (RPC_IS_SWAPPER(task) || + xprt_needs_memalloc(task->tk_xprt, task)) + current->flags |= PF_MEMALLOC; + trace_rpc_task_run_action(task, do_action); do_action(task); @@ -936,7 +950,7 @@ static void __rpc_execute(struct rpc_task *task) rpc_clear_running(task); spin_unlock(&queue->lock); if (task_is_async) - return; + goto out; /* sync task: sleep here */ trace_rpc_task_sync_sleep(task, task->tk_action); @@ -960,6 +974,8 @@ static void __rpc_execute(struct rpc_task *task) /* Release all resources associated with the task */ rpc_release_task(task); +out: + current_restore_flags(pflags, PF_MEMALLOC); } /* @@ -1018,8 +1034,6 @@ int rpc_malloc(struct rpc_task *task) if (RPC_IS_ASYNC(task)) gfp = GFP_NOWAIT | __GFP_NOWARN; - if (RPC_IS_SWAPPER(task)) - gfp |= __GFP_MEMALLOC; size += sizeof(struct rpc_buffer); if (size <= RPC_BUFFER_MAXSIZE) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a0a2583fe941..0614e7463d4b 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1492,6 +1492,9 @@ bool xprt_prepare_transmit(struct rpc_task *task) return false; } + if (atomic_read(&xprt->swapper)) + /* This will be clear in __rpc_execute */ + current->flags |= PF_MEMALLOC; return true; } diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 32df23796747..256b06a92391 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -239,8 +239,11 @@ xprt_rdma_connect_worker(struct work_struct *work) struct rpcrdma_xprt *r_xprt = container_of(work, struct rpcrdma_xprt, rx_connect_worker.work); struct rpc_xprt *xprt = &r_xprt->rx_xprt; + unsigned int pflags = current->flags; int rc; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; rc = rpcrdma_xprt_connect(r_xprt); xprt_clear_connecting(xprt); if (!rc) { @@ -254,6 +257,7 @@ xprt_rdma_connect_worker(struct work_struct *work) rpcrdma_xprt_disconnect(r_xprt); xprt_unlock_connect(xprt, r_xprt); xprt_wake_pending_tasks(xprt, rc); + current_restore_flags(pflags, PF_MEMALLOC); } /** @@ -576,8 +580,6 @@ xprt_rdma_allocate(struct rpc_task *task) flags = RPCRDMA_DEF_GFP; if (RPC_IS_ASYNC(task)) flags = GFP_NOWAIT | __GFP_NOWARN; - if (RPC_IS_SWAPPER(task)) - flags |= __GFP_MEMALLOC; if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize, flags)) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index d8ee06a9650a..9d34c71004fa 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2047,7 +2047,10 @@ static void xs_udp_setup_socket(struct work_struct *work) struct rpc_xprt *xprt = &transport->xprt; struct socket *sock; int status = -EIO; + unsigned int pflags = current->flags; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family, SOCK_DGRAM, IPPROTO_UDP, false); @@ -2067,6 +2070,7 @@ static void xs_udp_setup_socket(struct work_struct *work) xprt_clear_connecting(xprt); xprt_unlock_connect(xprt, transport); xprt_wake_pending_tasks(xprt, status); + current_restore_flags(pflags, PF_MEMALLOC); } /** @@ -2226,7 +2230,10 @@ static void xs_tcp_setup_socket(struct work_struct *work) struct socket *sock = transport->sock; struct rpc_xprt *xprt = &transport->xprt; int status; + unsigned int pflags = current->flags; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; if (!sock) { sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family, SOCK_STREAM, @@ -2291,6 +2298,7 @@ static void xs_tcp_setup_socket(struct work_struct *work) xprt_clear_connecting(xprt); out_unlock: xprt_unlock_connect(xprt, transport); + current_restore_flags(pflags, PF_MEMALLOC); } /** From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721512 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEE04C433F5 for ; Mon, 24 Jan 2022 03:55:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235251AbiAXDzL (ORCPT ); Sun, 23 Jan 2022 22:55:11 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:47116 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241275AbiAXDzK (ORCPT ); Sun, 23 Jan 2022 22:55:10 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 2C38F1F3A0; Mon, 24 Jan 2022 03:55:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996509; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dgOkz6W/cidFRu6Z7vRoQz377B/ORzg2mf3Oot+kE/k=; b=p3Qi5VyHXRKwZaQ6Ik9tGw6PE7oVR8Agw0MxDwze5ZGev6qfFo4kyLM26G3CL3u8i5Voo0 yTygC32RJ05gST1pBsAG36kc3J7zPivusmvwW/1gNebDNekBsqSEL4EQ8EAF440a5W09uw gg+5tsVndwuzicfbmRGZRVwZNOLpmEU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996509; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dgOkz6W/cidFRu6Z7vRoQz377B/ORzg2mf3Oot+kE/k=; b=WRM+aasJXrBXkRhLlzEShYpctxrAf4oRBDeUyZciDIYtotpuNY/+O4A4LbifhuaU38dpZn GbVA6Uc1mbd2PqBQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2F3EC13305; Mon, 24 Jan 2022 03:55:05 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id +bEWNxkj7mE3RgAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:55:05 +0000 Subject: [PATCH 21/23] NFSv4: keep state manager thread active if swap is enabled From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611286.26253.4223340247208710012.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org If we are swapping over NFSv4, we may not be able to allocate memory to start the state-manager thread at the time when we need it. So keep it always running when swap is enabled, and just signal it to start. This requires updating and testing the cl_swapper count on the root rpc_clnt after following all ->cl_parent links. Signed-off-by: NeilBrown --- fs/nfs/file.c | 15 ++++++++++++--- fs/nfs/nfs4_fs.h | 1 + fs/nfs/nfs4proc.c | 20 ++++++++++++++++++++ fs/nfs/nfs4state.c | 39 +++++++++++++++++++++++++++++++++------ include/linux/nfs_xdr.h | 2 ++ net/sunrpc/clnt.c | 2 ++ 6 files changed, 70 insertions(+), 9 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 04ba56f223d3..ceacae8e7a38 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -486,8 +486,9 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, unsigned long blocks; long long isize; int ret; - struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); - struct inode *inode = file->f_mapping->host; + struct inode *inode = file_inode(file); + struct rpc_clnt *clnt = NFS_CLIENT(inode); + struct nfs_client *cl = NFS_SERVER(inode)->nfs_client; spin_lock(&inode->i_lock); blocks = inode->i_blocks; @@ -508,14 +509,22 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, } *span = sis->pages; sis->flags |= SWP_FS_OPS; + + if (cl->rpc_ops->enable_swap) + cl->rpc_ops->enable_swap(inode); + return ret; } static void nfs_swap_deactivate(struct file *file) { - struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); + struct inode *inode = file_inode(file); + struct rpc_clnt *clnt = NFS_CLIENT(inode); + struct nfs_client *cl = NFS_SERVER(inode)->nfs_client; rpc_clnt_swap_deactivate(clnt); + if (cl->rpc_ops->disable_swap) + cl->rpc_ops->disable_swap(file_inode(file)); } const struct address_space_operations nfs_file_aops = { diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index ed5eaca6801e..8a9ce0f42efd 100644 --- a/fs/nfs/nfs4_fs.h +++ b/fs/nfs/nfs4_fs.h @@ -42,6 +42,7 @@ enum nfs4_client_state { NFS4CLNT_LEASE_MOVED, NFS4CLNT_DELEGATION_EXPIRED, NFS4CLNT_RUN_MANAGER, + NFS4CLNT_MANAGER_AVAILABLE, NFS4CLNT_RECALL_RUNNING, NFS4CLNT_RECALL_ANY_LAYOUT_READ, NFS4CLNT_RECALL_ANY_LAYOUT_RW, diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index ee3bc79f6ca3..ab6382f9cbf0 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -10347,6 +10347,24 @@ static ssize_t nfs4_listxattr(struct dentry *dentry, char *list, size_t size) return error + error2 + error3; } +static void nfs4_enable_swap(struct inode *inode) +{ + /* The state manager thread must always be running. + * It will notice the client is a swapper, and stay put. + */ + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client; + + nfs4_schedule_state_manager(clp); +} + +static void nfs4_disable_swap(struct inode *inode) +{ + /* The state manager thread will now exit once it is + * woken. + */ + wake_up_var(&NFS_SERVER(inode)->nfs_client->cl_state); +} + static const struct inode_operations nfs4_dir_inode_operations = { .create = nfs_create, .lookup = nfs_lookup, @@ -10423,6 +10441,8 @@ const struct nfs_rpc_ops nfs_v4_clientops = { .free_client = nfs4_free_client, .create_server = nfs4_create_server, .clone_server = nfs_clone_server, + .enable_swap = nfs4_enable_swap, + .disable_swap = nfs4_disable_swap, }; static const struct xattr_handler nfs4_xattr_nfs4_acl_handler = { diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index d88b779f9dd0..05d53c8a2ddb 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1205,10 +1205,17 @@ void nfs4_schedule_state_manager(struct nfs_client *clp) { struct task_struct *task; char buf[INET6_ADDRSTRLEN + sizeof("-manager") + 1]; + struct rpc_clnt *cl = clp->cl_rpcclient; + + while (cl != cl->cl_parent) + cl = cl->cl_parent; set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state); - if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0) + if (test_and_set_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state) != 0) { + wake_up_var(&clp->cl_state); return; + } + set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state); __module_get(THIS_MODULE); refcount_inc(&clp->cl_count); @@ -1224,6 +1231,7 @@ void nfs4_schedule_state_manager(struct nfs_client *clp) printk(KERN_ERR "%s: kthread_run: %ld\n", __func__, PTR_ERR(task)); nfs4_clear_state_manager_bit(clp); + clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state); nfs_put_client(clp); module_put(THIS_MODULE); } @@ -2665,11 +2673,8 @@ static void nfs4_state_manager(struct nfs_client *clp) clear_bit(NFS4CLNT_RECALL_RUNNING, &clp->cl_state); } - /* Did we race with an attempt to give us more work? */ - if (!test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state)) - return; - if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0) - return; + return; + } while (refcount_read(&clp->cl_count) > 1 && !signalled()); goto out_drain; @@ -2689,9 +2694,31 @@ static void nfs4_state_manager(struct nfs_client *clp) static int nfs4_run_state_manager(void *ptr) { struct nfs_client *clp = ptr; + struct rpc_clnt *cl = clp->cl_rpcclient; + + while (cl != cl->cl_parent) + cl = cl->cl_parent; allow_signal(SIGKILL); +again: + set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state); nfs4_state_manager(clp); + if (atomic_read(&cl->cl_swapper)) { + wait_var_event_interruptible(&clp->cl_state, + test_bit(NFS4CLNT_RUN_MANAGER, + &clp->cl_state)); + if (atomic_read(&cl->cl_swapper) && + test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state)) + goto again; + /* Either no longer a swapper, or were signalled */ + } + clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state); + + if (refcount_read(&clp->cl_count) > 1 && !signalled() && + test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) && + !test_and_set_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state)) + goto again; + nfs_put_client(clp); module_put_and_kthread_exit(0); return 0; diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 967a0098f0a9..04cf3a8fb949 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1795,6 +1795,8 @@ struct nfs_rpc_ops { struct nfs_server *(*create_server)(struct fs_context *); struct nfs_server *(*clone_server)(struct nfs_server *, struct nfs_fh *, struct nfs_fattr *, rpc_authflavor_t); + void (*enable_swap)(struct inode *inode); + void (*disable_swap)(struct inode *inode); }; /* diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index cb76fbea3ed5..4cb403a0f334 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -3066,6 +3066,8 @@ rpc_clnt_swap_activate_callback(struct rpc_clnt *clnt, int rpc_clnt_swap_activate(struct rpc_clnt *clnt) { + while (clnt != clnt->cl_parent) + clnt = clnt->cl_parent; if (atomic_inc_return(&clnt->cl_swapper) == 1) return rpc_clnt_iterate_for_each_xprt(clnt, rpc_clnt_swap_activate_callback, NULL); From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721513 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F2AFC433EF for ; Mon, 24 Jan 2022 03:55:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241270AbiAXDzZ (ORCPT ); Sun, 23 Jan 2022 22:55:25 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:47144 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235321AbiAXDzZ (ORCPT ); Sun, 23 Jan 2022 22:55:25 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 314321F3A0; Mon, 24 Jan 2022 03:55:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996524; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jKekh97n0i6AmpeVDE22svCq5nXmamLchP56LCBUk8Y=; b=cghIZ+kikeLgdtBbznxHOS+3WEdmBrO58IXPnQI2796gYuTKI0vwTkL6GMNqI2mYUIIeU6 ReNJlFXbDr6Eb54AwmHUbjX4yrIkQyU1S2kMSlYL4L6gRIdVymk3q/9Cqx7AimvDYswQYI nxCqilDbnGhP6QAEsarThDH0Sw84OTo= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996524; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jKekh97n0i6AmpeVDE22svCq5nXmamLchP56LCBUk8Y=; b=pyWYRf1SlHJfVNvJ5zVKWKuw4WN+JbjrZRekAoxXuyjygtU2r+5TTggEeP2gXDcVFJ3xIg XpH/SY4pzZRfnFAQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2E76D13305; Mon, 24 Jan 2022 03:55:20 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id g+iONygj7mFHRgAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:55:20 +0000 Subject: [PATCH 22/23] NFS: swap-out must always use STABLE writes. From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611287.26253.13462969110743208198.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org The commit handling code is not safe against memory-pressure deadlocks when writing to swap. In particular, nfs_commitdata_alloc() blocks indefinitely waiting for memory, and this can consume all available workqueue threads. swap-out most likely uses STABLE writes anyway as COND_STABLE indicates that a stable write should be used if the write fits in a single request, and it normally does. However if we ever swap with a small wsize, or gather unusually large numbers of pages for a single write, this might change. For safety, make it explicit in the code that direct writes used for swap must always use FLUSH_COND_STABLE. Signed-off-by: NeilBrown --- fs/nfs/direct.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 43a956d7fd62..29c007b2a17a 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -791,7 +791,7 @@ static const struct nfs_pgio_completion_ops nfs_direct_write_completion_ops = { */ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, struct iov_iter *iter, - loff_t pos) + loff_t pos, int ioflags) { struct nfs_pageio_descriptor desc; struct inode *inode = dreq->inode; @@ -799,7 +799,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, size_t requested_bytes = 0; size_t wsize = max_t(size_t, NFS_SERVER(inode)->wsize, PAGE_SIZE); - nfs_pageio_init_write(&desc, inode, FLUSH_COND_STABLE, false, + nfs_pageio_init_write(&desc, inode, ioflags, false, &nfs_direct_write_completion_ops); desc.pg_dreq = dreq; get_dreq(dreq); @@ -905,6 +905,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, struct nfs_direct_req *dreq; struct nfs_lock_context *l_ctx; loff_t pos, end; + int ioflags = swap ? FLUSH_COND_STABLE : FLUSH_STABLE; dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n", file, iov_iter_count(iter), (long long) iocb->ki_pos); @@ -947,7 +948,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, if (!swap) nfs_start_io_direct(inode); - requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos, ioflags); if (mapping->nrpages) { invalidate_inode_pages2_range(mapping, From patchwork Mon Jan 24 03:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12721514 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FA42C433F5 for ; Mon, 24 Jan 2022 03:55:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241283AbiAXDzh (ORCPT ); Sun, 23 Jan 2022 22:55:37 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:57064 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235321AbiAXDzg (ORCPT ); Sun, 23 Jan 2022 22:55:36 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 378CC21997; Mon, 24 Jan 2022 03:55:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1642996535; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WFyjPabVahDB+vrXElKQ+UDq/fcSAdM6+RjOQdJJ+zQ=; b=VgQrwpIxLgoCjPnFeLQ6BZXyoET8QaMuJj5dg3XrTQjG/Dyt8rnQo5H2GJQtXWKNEPjzw0 3LVlkZnGEErfLJ5EZGcO/nb+tBz6PyIHowNAMuCegClQg5ZrMhmmIZTDjms6NXFeLDRqPQ zG8uip/oHc07Wt3VH1lEmYNs8wJB0uM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1642996535; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WFyjPabVahDB+vrXElKQ+UDq/fcSAdM6+RjOQdJJ+zQ=; b=JvFFvxy7hIQZ1+xEKNkM/EQRldT0iSe3V12OSgUpU+cHWb6H8JoYwh8OdaLk3As3Rgau0m WMhtltZp6u4HM8Ag== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id EEB8813305; Mon, 24 Jan 2022 03:55:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id eiDtKjMj7mFfRgAAMHmgww (envelope-from ); Mon, 24 Jan 2022 03:55:31 +0000 Subject: [PATCH 23/23] SUNRPC: lock against ->sock changing during sysfs read From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Mon, 24 Jan 2022 14:48:32 +1100 Message-ID: <164299611287.26253.6282866540933339893.stgit@noble.brown> In-Reply-To: <164299573337.26253.7538614611220034049.stgit@noble.brown> References: <164299573337.26253.7538614611220034049.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org ->sock can be set to NULL asynchronously unless ->recv_mutex is held. So it is important to hold that mutex. Otherwise a sysfs read can trigger an oops. Commit 17f09d3f619a ("SUNRPC: Check if the xprt is connected before handling sysfs reads") appears to attempt to fix this problem, but it only narrows the race window. Fixes: 17f09d3f619a ("SUNRPC: Check if the xprt is connected before handling sysfs reads") Fixes: a8482488a7d6 ("SUNRPC query transport's source port") Signed-off-by: NeilBrown --- net/sunrpc/sysfs.c | 5 ++++- net/sunrpc/xprtsock.c | 7 ++++++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/sysfs.c b/net/sunrpc/sysfs.c index 2766dd21935b..baaf65ea9e38 100644 --- a/net/sunrpc/sysfs.c +++ b/net/sunrpc/sysfs.c @@ -115,11 +115,14 @@ static ssize_t rpc_sysfs_xprt_srcaddr_show(struct kobject *kobj, } sock = container_of(xprt, struct sock_xprt, xprt); - if (kernel_getsockname(sock->sock, (struct sockaddr *)&saddr) < 0) + mutex_lock(&sock->recv_mutex); + if (sock->sock == NULL || + kernel_getsockname(sock->sock, (struct sockaddr *)&saddr) < 0) goto out; ret = sprintf(buf, "%pISc\n", &saddr); out: + mutex_unlock(&sock->recv_mutex); xprt_put(xprt); return ret + 1; } diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 9d34c71004fa..3f2b766e9f82 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1641,7 +1641,12 @@ static int xs_get_srcport(struct sock_xprt *transport) unsigned short get_srcport(struct rpc_xprt *xprt) { struct sock_xprt *sock = container_of(xprt, struct sock_xprt, xprt); - return xs_sock_getport(sock->sock); + unsigned short ret = 0; + mutex_lock(&sock->recv_mutex); + if (sock->sock) + ret = xs_sock_getport(sock->sock); + mutex_unlock(&sock->recv_mutex); + return ret; } EXPORT_SYMBOL(get_srcport);