From patchwork Tue Nov 16 02:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12621129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B08EC433F5 for ; Tue, 16 Nov 2021 02:45:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A23176187F for ; Tue, 16 Nov 2021 02:45:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A23176187F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 455426B00AD; Mon, 15 Nov 2021 21:45:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 404D96B00AE; Mon, 15 Nov 2021 21:45:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 259696B00AF; Mon, 15 Nov 2021 21:45:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0168.hostedemail.com [216.40.44.168]) by kanga.kvack.org (Postfix) with ESMTP id 170966B00AD for ; Mon, 15 Nov 2021 21:45:37 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C103018136846 for ; Tue, 16 Nov 2021 02:45:36 +0000 (UTC) X-FDA: 78813252714.14.DA4CD29 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf07.hostedemail.com (Postfix) with ESMTP id 5B9AC10000BA for ; Tue, 16 Nov 2021 02:45:36 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 398F321910; Tue, 16 Nov 2021 02:45:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1637030735; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1/BSIu6f8h3udxg/DtgGyKIFrkiRDq54LWgglD0VugE=; b=IplAxp1y0sekl7XcljUnG5z89tZMSeqbWmLOhyGMm6vG+6I7vAkuytzPBU64O7NA+Eh0fk y+rZNiyqm9YhWdNNwTYf9+lssWL3iGI3wRyd2yNX3uVMn7sAg6KKFp4i0ZFKbsZRwsO1P4 /BP+n7Jkn6O9I+qUqUoTwbwVKb1J6QE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1637030735; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1/BSIu6f8h3udxg/DtgGyKIFrkiRDq54LWgglD0VugE=; b=AWlfzuqouIRw64bl6P8ib6Gos4ML2cOYA82dE/tQ3Wp7jzocfC+UdQnkeEIwOpvnqpyzwP R2e+lzQOcjNzc6Bw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D80E713B70; Tue, 16 Nov 2021 02:45:32 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 1wlgJUwbk2G+CAAAMHmgww (envelope-from ); Tue, 16 Nov 2021 02:45:32 +0000 Subject: [PATCH 01/13] NFS: move generic_write_checks() call from nfs_file_direct_write() to nfs_file_write() From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 16 Nov 2021 13:44:04 +1100 Message-ID: <163703064449.25805.2687706207398048223.stgit@noble.brown> In-Reply-To: <163702956672.25805.16457749992977493579.stgit@noble.brown> References: <163702956672.25805.16457749992977493579.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5B9AC10000BA X-Stat-Signature: yfnxom1yhnihpanjm8ha47wsteorpjcr Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=IplAxp1y; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=AWlfzuqo; spf=pass (imf07.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1637030736-458829 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: generic_write_checks() is not needed for swap-out writes, and fails if they are attempted. nfs_file_direct_write() currently calls generic_write_checks() and is in turn called from: nfs_direct_IO - only for swap-out nfs_file_write - for normal O_DIRECT write So move the generic_write_checks() call into nfs_file_write(). This allows NFS swap-out writes to complete. Fixes: dc617f29dbe5 ("vfs: don't allow writes to swap files") Signed-off-by: NeilBrown --- fs/nfs/direct.c | 5 +---- fs/nfs/file.c | 6 +++++- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 9cff8709c80a..1e80d243ba25 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -905,10 +905,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n", file, iov_iter_count(iter), (long long) iocb->ki_pos); - result = generic_write_checks(iocb, iter); - if (result <= 0) - return result; - count = result; + count = iov_iter_count(iter); nfs_add_stats(mapping->host, NFSIOS_DIRECTWRITTENBYTES, count); pos = iocb->ki_pos; diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 24e7dccce355..45d8180b7be3 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -615,8 +615,12 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) if (result) return result; - if (iocb->ki_flags & IOCB_DIRECT) + if (iocb->ki_flags & IOCB_DIRECT) { + result = generic_write_checks(iocb, from); + if (result <= 0) + return result; return nfs_file_direct_write(iocb, from); + } dprintk("NFS: write(%pD2, %zu@%Ld)\n", file, iov_iter_count(from), (long long) iocb->ki_pos); From patchwork Tue Nov 16 02:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12621131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B201DC433F5 for ; Tue, 16 Nov 2021 02:45:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3694461BF8 for ; Tue, 16 Nov 2021 02:45:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3694461BF8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id C6D336B00AF; Mon, 15 Nov 2021 21:45:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C038B6B00B1; Mon, 15 Nov 2021 21:45:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A94BB6B00B0; Mon, 15 Nov 2021 21:45:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0031.hostedemail.com [216.40.44.31]) by kanga.kvack.org (Postfix) with ESMTP id 9AD1F6B00AE for ; Mon, 15 Nov 2021 21:45:46 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5475C82499B9 for ; Tue, 16 Nov 2021 02:45:46 +0000 (UTC) X-FDA: 78813253092.27.2B8FD97 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf27.hostedemail.com (Postfix) with ESMTP id BD57A70000A2 for ; Tue, 16 Nov 2021 02:45:45 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 60D091FD48; Tue, 16 Nov 2021 02:45:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1637030744; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HAFJQqD66lEKxyMF66YMnhf/bOSH3X44K2KcJkng4Uk=; b=gbCVQ6bDbY22Q4NrW50evrXT7Ds3nDI8NTZ9Io9sBz8NoaC8jFFX4LTn9aA2PckSz94ueJ SvWZN4rSfdqNgtdqvCb3r6IMZTWyB8yR622TKV5wwN3g9rx4EO5seoFCc3Xi4NO3yH5dFZ v5umM8wYEzG+99gUwLsGHV5PyxHV710= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1637030744; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HAFJQqD66lEKxyMF66YMnhf/bOSH3X44K2KcJkng4Uk=; b=mqZ30JPHIzdhcGRAwSZF5MRmu7hx8qlwHnCXFyzSJrWYCq8+8g8lNDIo44ZDQDQku5U0lR tfeimJ8vg5wpEbAQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0A16A13B70; Tue, 16 Nov 2021 02:45:41 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id jzimLlUbk2HNCAAAMHmgww (envelope-from ); Tue, 16 Nov 2021 02:45:41 +0000 Subject: [PATCH 02/13] NFS: do not take i_rwsem for swap IO From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 16 Nov 2021 13:44:04 +1100 Message-ID: <163703064452.25805.5738767545414940042.stgit@noble.brown> In-Reply-To: <163702956672.25805.16457749992977493579.stgit@noble.brown> References: <163702956672.25805.16457749992977493579.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Stat-Signature: szazthy7qh9yu6iqx1osdfpxwte8q14j Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=gbCVQ6bD; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=mqZ30JPH; spf=pass (imf27.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: BD57A70000A2 X-HE-Tag: 1637030745-762450 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted. We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So don't take the rwsem or set the NFS_INO_ODIRECT flag during IO to the swap file. Signed-off-by: NeilBrown --- fs/nfs/io.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/nfs/io.c b/fs/nfs/io.c index b5551ed8f648..83b4dfbb826d 100644 --- a/fs/nfs/io.c +++ b/fs/nfs/io.c @@ -118,11 +118,18 @@ static void nfs_block_buffered(struct nfs_inode *nfsi, struct inode *inode) * NFS_INO_ODIRECT. * Note that buffered writes and truncates both take a write lock on * inode->i_rwsem, meaning that those are serialised w.r.t. O_DIRECT. + * + * When inode IS_SWAPFILE we ignore the flag and don't take the rwsem + * as it triggers lockdep warnings and possible deadlocks. + * bufferred writes are forbidden anyway, and buffered reads will not + * be coherent. */ void nfs_start_io_direct(struct inode *inode) { struct nfs_inode *nfsi = NFS_I(inode); + if (IS_SWAPFILE(inode)) + return; /* Be an optimist! */ down_read(&inode->i_rwsem); if (test_bit(NFS_INO_ODIRECT, &nfsi->flags) != 0) @@ -144,5 +151,7 @@ nfs_start_io_direct(struct inode *inode) void nfs_end_io_direct(struct inode *inode) { + if (IS_SWAPFILE(inode)) + return; up_read(&inode->i_rwsem); } From patchwork Tue Nov 16 02:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12621133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B3A4C433F5 for ; Tue, 16 Nov 2021 02:45:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9EBD561C4F for ; Tue, 16 Nov 2021 02:45:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 9EBD561C4F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 2F5D26B00AE; Mon, 15 Nov 2021 21:45:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A5866B00B0; Mon, 15 Nov 2021 21:45:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 16D656B00B1; Mon, 15 Nov 2021 21:45:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0215.hostedemail.com [216.40.44.215]) by kanga.kvack.org (Postfix) with ESMTP id 057786B00AE for ; Mon, 15 Nov 2021 21:45:55 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A462A84CEC for ; Tue, 16 Nov 2021 02:45:54 +0000 (UTC) X-FDA: 78813253428.20.AFD03FE Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf01.hostedemail.com (Postfix) with ESMTP id 844115092A30 for ; Tue, 16 Nov 2021 02:45:37 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id DCE7C1FD67; Tue, 16 Nov 2021 02:45:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1637030752; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SCPnWzrli9uhGf5cSQnA2kslw6w9vTsO2ZaI10TRMW0=; b=pZE6Mn45OybuBUV0fFOy6OA0abrIA0AR+Fglj1GmFirEaWFtVDkCDYxp+ocPMp60lDJ7N2 TaWmdEQKPcqYIl2pmPRYlaKiDIw7u0icJyyQk2e/fKk1Mzsv1R0v4/yvnKIgkZCWcNlf83 VqbAdkFTRkcWYEkojriQxOzGfTp73+4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1637030752; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SCPnWzrli9uhGf5cSQnA2kslw6w9vTsO2ZaI10TRMW0=; b=8qHcqlPn9r3KPsRJ2cy06JSUnxN3MrXeqxOuPnS8oTruMA9u3t2SC7iQhU5U1gJPcf5I7b 3IWDw2sfjb9aJ/Cg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 84C4713B70; Tue, 16 Nov 2021 02:45:49 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id AYUoLF0bk2HZCAAAMHmgww (envelope-from ); Tue, 16 Nov 2021 02:45:49 +0000 Subject: [PATCH 03/13] MM: reclaim mustn't enter FS for swap-over-NFS From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 16 Nov 2021 13:44:04 +1100 Message-ID: <163703064452.25805.16932817889703270242.stgit@noble.brown> In-Reply-To: <163702956672.25805.16457749992977493579.stgit@noble.brown> References: <163702956672.25805.16457749992977493579.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 844115092A30 X-Stat-Signature: ism39bbf1r8dokjhw6bumnnbrwkowtkf Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=pZE6Mn45; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=8qHcqlPn; spf=pass (imf01.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1637030737-218139 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If swap-out is using filesystem operations (SWP_FS_OPS), then it is not safe to enter the FS for reclaim. So only down-grade the requirement for swap pages to __GFP_IO after checking that SWP_FS_OPS are not being used. Signed-off-by: NeilBrown Signed-off-by: NeilBrown Reported-by: kernel test robot --- mm/vmscan.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index fb9584641ac7..049ff4494081 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1513,8 +1513,14 @@ static unsigned int shrink_page_list(struct list_head *page_list, if (!sc->may_unmap && page_mapped(page)) goto keep_locked; + /* ->flags can be updated non-atomicially (scan_swap_map_slots), + * but that will never affect SWP_FS_OPS, so the data_race + * is safe. + */ may_enter_fs = (sc->gfp_mask & __GFP_FS) || - (PageSwapCache(page) && (sc->gfp_mask & __GFP_IO)); + (PageSwapCache(page) && + !data_race(page_swap_info(page)->flags & SWP_FS_OPS) && + (sc->gfp_mask & __GFP_IO)); /* * The number of dirty pages determines if a node is marked @@ -1682,7 +1688,9 @@ static unsigned int shrink_page_list(struct list_head *page_list, goto activate_locked_split; } - may_enter_fs = true; + if ((sc->gfp_mask & __GFP_FS) || + !data_race(page_swap_info(page)->flags & SWP_FS_OPS)) + may_enter_fs = true; /* Adding to swap updated mapping */ mapping = page_mapping(page); From patchwork Tue Nov 16 02:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12621135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5720CC433EF for ; Tue, 16 Nov 2021 02:46:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 069E16187F for ; Tue, 16 Nov 2021 02:46:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 069E16187F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 936D16B00B0; Mon, 15 Nov 2021 21:46:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E7026B00B1; Mon, 15 Nov 2021 21:46:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 788446B00B2; Mon, 15 Nov 2021 21:46:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0086.hostedemail.com [216.40.44.86]) by kanga.kvack.org (Postfix) with ESMTP id 675BB6B00B0 for ; Mon, 15 Nov 2021 21:46:02 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2781E82FF4 for ; Tue, 16 Nov 2021 02:46:02 +0000 (UTC) X-FDA: 78813253764.30.6A7BD49 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf02.hostedemail.com (Postfix) with ESMTP id 8CB2F700177D for ; Tue, 16 Nov 2021 02:45:54 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 7F2D71FD6B; Tue, 16 Nov 2021 02:46:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1637030760; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uJLgBSr+4ac2HtnFPf6KLC0wSgPyRK3wpQcub+LxsjE=; b=JkAmyrxApRyuTBn7vn5Z5g6h1Fw+zu5G/EIQ3HoU4SR6DCkJTc9MQRxY5p+rbbFiopf+Ks DfC0qFsTopsEA3OGsIEL0PyieQTegaqXDMaaPL7gVyaFlhnW1dIbc/ade8zsDKNbivdhIn ENj9yTsm/EJZis0r4Llb018InpIGK5I= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1637030760; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uJLgBSr+4ac2HtnFPf6KLC0wSgPyRK3wpQcub+LxsjE=; b=Za/pDRAZdOa2LQNWoKokZQtyi9KJ4nzaePa487hjO6EW2QwB66ceS2eOY+PoOiRooaGF/M PTd0buGgp9HXduDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2896513B70; Tue, 16 Nov 2021 02:45:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Dl4tNmUbk2HiCAAAMHmgww (envelope-from ); Tue, 16 Nov 2021 02:45:57 +0000 Subject: [PATCH 04/13] SUNRPC/call_alloc: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 16 Nov 2021 13:44:04 +1100 Message-ID: <163703064453.25805.15157255463292733812.stgit@noble.brown> In-Reply-To: <163702956672.25805.16457749992977493579.stgit@noble.brown> References: <163702956672.25805.16457749992977493579.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Stat-Signature: xqequuqwdf3rru46xk68ag8ck65xxcqw Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=JkAmyrxA; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="Za/pDRAZ"; spf=pass (imf02.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 8CB2F700177D X-HE-Tag: 1637030754-902483 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. rpc_malloc() can block, and this might cause deadlocks. So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if blocking is acceptable. Signed-off-by: NeilBrown --- net/sunrpc/sched.c | 4 +++- net/sunrpc/xprtrdma/transport.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index e2c835482791..d5b6e897f5a5 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -1023,8 +1023,10 @@ int rpc_malloc(struct rpc_task *task) struct rpc_buffer *buf; gfp_t gfp = GFP_NOFS; + if (RPC_IS_ASYNC(task)) + gfp = GFP_NOWAIT | __GFP_NOWARN; if (RPC_IS_SWAPPER(task)) - gfp = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN; + gfp |= __GFP_MEMALLOC; size += sizeof(struct rpc_buffer); if (size <= RPC_BUFFER_MAXSIZE) diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 16e5696314a4..a52277115500 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -574,8 +574,10 @@ xprt_rdma_allocate(struct rpc_task *task) gfp_t flags; flags = RPCRDMA_DEF_GFP; + if (RPC_IS_ASYNC(task)) + flags = GFP_NOWAIT | __GFP_NOWARN; if (RPC_IS_SWAPPER(task)) - flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN; + flags |= __GFP_MEMALLOC; if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize, flags)) From patchwork Tue Nov 16 02:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12621137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 873ADC433FE for ; Tue, 16 Nov 2021 02:46:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 380E861BF8 for ; Tue, 16 Nov 2021 02:46:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 380E861BF8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id C1D176B00B2; Mon, 15 Nov 2021 21:46:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BCCEA6B00B3; Mon, 15 Nov 2021 21:46:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4A816B00B4; Mon, 15 Nov 2021 21:46:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0028.hostedemail.com [216.40.44.28]) by kanga.kvack.org (Postfix) with ESMTP id 942436B00B2 for ; Mon, 15 Nov 2021 21:46:08 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5419983C39 for ; Tue, 16 Nov 2021 02:46:08 +0000 (UTC) X-FDA: 78813254016.23.711C779 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf13.hostedemail.com (Postfix) with ESMTP id 8E1D51052644 for ; Tue, 16 Nov 2021 02:45:54 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 400601FD33; Tue, 16 Nov 2021 02:46:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1637030766; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/Xf0wQNVm3xiMoEph9xuiX0w4MC4SZqJ4flfzelN2po=; b=VJIR0cyhy6+rDmsJPTjQNCyx87yUfBzkCGqr/9ZKK38FuAx0irS29ecpvQph7nc7DB0PXK zyfe2RrTBeLMFsQnbw0aDyV9mgNmBZ+2dg9nUh8mJyiMH3cJSrbMEQCNyFHnkLjpWCel+l 5+7Nzp22ee93S5O3y+UpWkSXnH+Oca0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1637030766; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/Xf0wQNVm3xiMoEph9xuiX0w4MC4SZqJ4flfzelN2po=; b=XYbFHa1Mf8GWawajp8Vr7gfPcNGx+veH+RIEdyXlAYbJOAcoi26JgWpw7if9c9Xjw6Fc55 6suakjsEmSFDMrBQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D452C13B70; Tue, 16 Nov 2021 02:46:03 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id gq5uJGsbk2HtCAAAMHmgww (envelope-from ); Tue, 16 Nov 2021 02:46:03 +0000 Subject: [PATCH 05/13] SUNRPC/auth: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 16 Nov 2021 13:44:04 +1100 Message-ID: <163703064453.25805.5824369466390732183.stgit@noble.brown> In-Reply-To: <163702956672.25805.16457749992977493579.stgit@noble.brown> References: <163702956672.25805.16457749992977493579.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8E1D51052644 X-Stat-Signature: p9apxqwg1cgy7ua1u6796nw1fenzx3dm Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=VJIR0cyh; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=XYbFHa1M; spf=pass (imf13.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1637030754-544840 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. lookup_cred() can block on a mempool or kmalloc - and this can cause deadlocks. So add a new RPCAUTH_LOOKUP flag for async lookups and don't block on memory. If the -ENOMEM gets back to call_refreshresult(), wait a short while and try again. HZ>>4 is chosen as it is used elsewhere for -ENOMEM retries. Signed-off-by: NeilBrown --- include/linux/sunrpc/auth.h | 1 + net/sunrpc/auth.c | 6 +++++- net/sunrpc/auth_gss/auth_gss.c | 6 +++++- net/sunrpc/auth_unix.c | 10 ++++++++-- net/sunrpc/clnt.c | 3 +++ 5 files changed, 22 insertions(+), 4 deletions(-) diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h index 98da816b5fc2..3e6ce288a7fc 100644 --- a/include/linux/sunrpc/auth.h +++ b/include/linux/sunrpc/auth.h @@ -99,6 +99,7 @@ struct rpc_auth_create_args { /* Flags for rpcauth_lookupcred() */ #define RPCAUTH_LOOKUP_NEW 0x01 /* Accept an uninitialised cred */ +#define RPCAUTH_LOOKUP_ASYNC 0x02 /* Don't block waiting for memory */ /* * Client authentication ops diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index a9f0d17fdb0d..6bfa19f9fa6a 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -615,6 +615,8 @@ rpcauth_bind_root_cred(struct rpc_task *task, int lookupflags) }; struct rpc_cred *ret; + if (RPC_IS_ASYNC(task)) + lookupflags |= RPCAUTH_LOOKUP_ASYNC; ret = auth->au_ops->lookup_cred(auth, &acred, lookupflags); put_cred(acred.cred); return ret; @@ -631,6 +633,8 @@ rpcauth_bind_machine_cred(struct rpc_task *task, int lookupflags) if (!acred.principal) return NULL; + if (RPC_IS_ASYNC(task)) + lookupflags |= RPCAUTH_LOOKUP_ASYNC; return auth->au_ops->lookup_cred(auth, &acred, lookupflags); } @@ -654,7 +658,7 @@ rpcauth_bindcred(struct rpc_task *task, const struct cred *cred, int flags) }; if (flags & RPC_TASK_ASYNC) - lookupflags |= RPCAUTH_LOOKUP_NEW; + lookupflags |= RPCAUTH_LOOKUP_NEW | RPCAUTH_LOOKUP_ASYNC; if (task->tk_op_cred) /* Task must use exactly this rpc_cred */ new = get_rpccred(task->tk_op_cred); diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index 5f42aa5fc612..df72d6301f78 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -1341,7 +1341,11 @@ gss_hash_cred(struct auth_cred *acred, unsigned int hashbits) static struct rpc_cred * gss_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) { - return rpcauth_lookup_credcache(auth, acred, flags, GFP_NOFS); + gfp_t gfp = GFP_NOFS; + + if (flags & RPCAUTH_LOOKUP_ASYNC) + gfp = GFP_NOWAIT | __GFP_NOWARN; + return rpcauth_lookup_credcache(auth, acred, flags, gfp); } static struct rpc_cred * diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c index e7df1f782b2e..e5819265dd1b 100644 --- a/net/sunrpc/auth_unix.c +++ b/net/sunrpc/auth_unix.c @@ -43,8 +43,14 @@ unx_destroy(struct rpc_auth *auth) static struct rpc_cred * unx_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) { - struct rpc_cred *ret = mempool_alloc(unix_pool, GFP_NOFS); - + gfp_t gfp = GFP_NOFS; + struct rpc_cred *ret; + + if (flags & RPCAUTH_LOOKUP_ASYNC) + gfp = GFP_NOWAIT | __GFP_NOWARN; + ret = mempool_alloc(unix_pool, gfp); + if (!ret) + return ERR_PTR(-ENOMEM); rpcauth_init_cred(ret, acred, auth, &unix_credops); ret->cr_flags = 1UL << RPCAUTH_CRED_UPTODATE; return ret; diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index a312ea2bc440..238b2ef5491f 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1745,6 +1745,9 @@ call_refreshresult(struct rpc_task *task) task->tk_cred_retry--; trace_rpc_retry_refresh_status(task); return; + case -ENOMEM: + rpc_delay(task, HZ >> 4); + return; } trace_rpc_refresh_status(task); rpc_call_rpcerror(task, status); From patchwork Tue Nov 16 02:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12621139 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D379C433EF for ; Tue, 16 Nov 2021 02:46:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E32A761C15 for ; Tue, 16 Nov 2021 02:46:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E32A761C15 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 7D7006B00B4; Mon, 15 Nov 2021 21:46:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7864A6B00B5; Mon, 15 Nov 2021 21:46:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64EAE6B00B6; Mon, 15 Nov 2021 21:46:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0024.hostedemail.com [216.40.44.24]) by kanga.kvack.org (Postfix) with ESMTP id 547796B00B4 for ; Mon, 15 Nov 2021 21:46:15 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 174A084795 for ; Tue, 16 Nov 2021 02:46:15 +0000 (UTC) X-FDA: 78813254268.17.11A65FB Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf15.hostedemail.com (Postfix) with ESMTP id F00A9D00009F for ; Tue, 16 Nov 2021 02:45:58 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 8CA8F212C9; Tue, 16 Nov 2021 02:46:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1637030773; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f9JPZARu0omLin5bRQVSgWDU6wAXTb1AVpCm2PGq0QA=; b=atmetXEZDy8cMBJPuxGdsioA9iEEjKYPYqm8r3COaJLFACTYSEnML7mYQBMmGJy6qWq+sX 17efbj7aEBJcrchB+nb1mGhfUGNR07Lbzb50Y1OpJ6jJRTv1E1ym8QpibclWvHrVTz6UqJ 0VLAHKfisqbII2bdSfpPxKIVMbaS6ng= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1637030773; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f9JPZARu0omLin5bRQVSgWDU6wAXTb1AVpCm2PGq0QA=; b=QhGRS68RrxSioGc5OYbdql4ol0fghFMX4at/b29RBqkbuBMOvukvloUoM4ifPSimpAdbic 4B6GESV1oZ1RaDBQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3665113B70; Tue, 16 Nov 2021 02:46:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 8gWHOXIbk2H0CAAAMHmgww (envelope-from ); Tue, 16 Nov 2021 02:46:10 +0000 Subject: [PATCH 06/13] SUNRPC/xprt: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 16 Nov 2021 13:44:04 +1100 Message-ID: <163703064454.25805.13286573055267156315.stgit@noble.brown> In-Reply-To: <163702956672.25805.16457749992977493579.stgit@noble.brown> References: <163702956672.25805.16457749992977493579.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: F00A9D00009F X-Stat-Signature: j6yk6gqqnkmxjt3o8bqhyqgkwc3huf73 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=atmetXEZ; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=QhGRS68R; spf=pass (imf15.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1637030758-39699 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. xprt_dynamic_alloc_slot can block indefinitely. This can tie up all workqueue threads and NFS can deadlock. So when called from a workqueue, set __GFP_NORETRY. The rdma alloc_slot already does not block. However it sets the error to -EAGAIN suggesting this will trigger a sleep. It does not. As we can see in call_reserveresult(), only -ENOMEM causes a sleep. -EAGAIN causes immediate retry. Signed-off-by: NeilBrown --- net/sunrpc/xprt.c | 5 ++++- net/sunrpc/xprtrdma/transport.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a02de2bddb28..47d207e416ab 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1687,12 +1687,15 @@ static bool xprt_throttle_congested(struct rpc_xprt *xprt, struct rpc_task *task static struct rpc_rqst *xprt_dynamic_alloc_slot(struct rpc_xprt *xprt) { struct rpc_rqst *req = ERR_PTR(-EAGAIN); + gfp_t gfp_mask = GFP_NOFS; if (xprt->num_reqs >= xprt->max_reqs) goto out; ++xprt->num_reqs; spin_unlock(&xprt->reserve_lock); - req = kzalloc(sizeof(struct rpc_rqst), GFP_NOFS); + if (current->flags & PF_WQ_WORKER) + gfp_mask |= __GFP_NORETRY | __GFP_NOWARN; + req = kzalloc(sizeof(struct rpc_rqst), gfp_mask); spin_lock(&xprt->reserve_lock); if (req != NULL) goto out; diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index a52277115500..32df23796747 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -521,7 +521,7 @@ xprt_rdma_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task) return; out_sleep: - task->tk_status = -EAGAIN; + task->tk_status = -ENOMEM; xprt_add_backlog(xprt, task); } From patchwork Tue Nov 16 02:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12621141 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37135C433F5 for ; Tue, 16 Nov 2021 02:46:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D27DC61BF8 for ; Tue, 16 Nov 2021 02:46:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D27DC61BF8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 7891A6B00B6; Mon, 15 Nov 2021 21:46:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7386C6B00B8; Mon, 15 Nov 2021 21:46:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6003C6B00B9; Mon, 15 Nov 2021 21:46:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0197.hostedemail.com [216.40.44.197]) by kanga.kvack.org (Postfix) with ESMTP id 511E66B00B6 for ; Mon, 15 Nov 2021 21:46:26 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E86B4180458F6 for ; Tue, 16 Nov 2021 02:46:25 +0000 (UTC) X-FDA: 78813254730.23.64CDD2B Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf05.hostedemail.com (Postfix) with ESMTP id 39A305092A2A for ; Tue, 16 Nov 2021 02:45:57 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E893321923; Tue, 16 Nov 2021 02:46:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1637030780; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ca/QMA9b8aSP2XqDNpzMoh70WOqFOvfMXBDfQmXLg8g=; b=BA/7VVkzYgikqnljre6z0kEzMXWBR1jLLrcLnnWltAfoNyOQtK6J+Jx4wupP/+IAlYmhdz GSwNAmMbB34r22uBTPoX9/ltDAPwZWZFXWI4bCF+Q6Sj/SGF8ceVslMyUgbXNPrAv6mBFN FJ+9r2R4C4yF8Q5j6BzgOgQg8mcgWYI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1637030780; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ca/QMA9b8aSP2XqDNpzMoh70WOqFOvfMXBDfQmXLg8g=; b=3KkNx19ZyXLKlEyPsJGjz3XKnm9NLUyjpmlEuEAliLF8FgVI+BmEwWuaJ7vvM6+PUzidlO Hsqe/ZF2mUxoTiBg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9363213B70; Tue, 16 Nov 2021 02:46:18 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id gueiFHobk2H8CAAAMHmgww (envelope-from ); Tue, 16 Nov 2021 02:46:18 +0000 Subject: [PATCH 07/13] SUNRPC: remove scheduling boost for "SWAPPER" tasks. From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 16 Nov 2021 13:44:04 +1100 Message-ID: <163703064454.25805.6702162055736458441.stgit@noble.brown> In-Reply-To: <163702956672.25805.16457749992977493579.stgit@noble.brown> References: <163702956672.25805.16457749992977493579.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Stat-Signature: emdu7cg5oy5juyimgzpp4tzkfzcuru33 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="BA/7VVkz"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=3KkNx19Z; spf=pass (imf05.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 39A305092A2A X-HE-Tag: 1637030757-52847 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, tasks marked as "swapper" tasks get put to the front of non-priority rpc_queues, and are sorted earlier than non-swapper tasks on the transport's ->xmit_queue. This is pointless as currently *all* tasks for a mount that has swap enabled on *any* file are marked as "swapper" tasks. So the net result is that the non-priority rpc_queues are reverse-ordered (LIFO). This scheduling boost is not necessary to avoid deadlocks, and hurts fairness, so remove it. If there were a need to expedite some requests, the tk_priority mechanism is a more appropriate tool. Signed-off-by: NeilBrown --- net/sunrpc/sched.c | 7 ------- net/sunrpc/xprt.c | 11 ----------- 2 files changed, 18 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index d5b6e897f5a5..256302bf6557 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -186,11 +186,6 @@ static void __rpc_add_wait_queue_priority(struct rpc_wait_queue *queue, /* * Add new request to wait queue. - * - * Swapper tasks always get inserted at the head of the queue. - * This should avoid many nasty memory deadlocks and hopefully - * improve overall performance. - * Everyone else gets appended to the queue to ensure proper FIFO behavior. */ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, struct rpc_task *task, @@ -199,8 +194,6 @@ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, INIT_LIST_HEAD(&task->u.tk_wait.timer_list); if (RPC_IS_PRIORITY(queue)) __rpc_add_wait_queue_priority(queue, task, queue_priority); - else if (RPC_IS_SWAPPER(task)) - list_add(&task->u.tk_wait.list, &queue->tasks[0]); else list_add_tail(&task->u.tk_wait.list, &queue->tasks[0]); task->tk_waitqueue = queue; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 47d207e416ab..a0a2583fe941 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1354,17 +1354,6 @@ xprt_request_enqueue_transmit(struct rpc_task *task) INIT_LIST_HEAD(&req->rq_xmit2); goto out; } - } else if (RPC_IS_SWAPPER(task)) { - list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { - if (pos->rq_cong || pos->rq_bytes_sent) - continue; - if (RPC_IS_SWAPPER(pos->rq_task)) - continue; - /* Note: req is added _before_ pos */ - list_add_tail(&req->rq_xmit, &pos->rq_xmit); - INIT_LIST_HEAD(&req->rq_xmit2); - goto out; - } } else if (!req->rq_seqno) { list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { if (pos->rq_task->tk_owner != task->tk_owner) From patchwork Tue Nov 16 02:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12621143 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7048DC433F5 for ; Tue, 16 Nov 2021 02:46:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 21F6D61BF8 for ; Tue, 16 Nov 2021 02:46:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 21F6D61BF8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B34786B00B9; Mon, 15 Nov 2021 21:46:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AE2676B00BA; Mon, 15 Nov 2021 21:46:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 95BD96B00BB; Mon, 15 Nov 2021 21:46:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id 856486B00B9 for ; Mon, 15 Nov 2021 21:46:31 -0500 (EST) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 45B161816C8B4 for ; Tue, 16 Nov 2021 02:46:31 +0000 (UTC) X-FDA: 78813254982.34.647A883 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf05.hostedemail.com (Postfix) with ESMTP id AB6AB5092A39 for ; Tue, 16 Nov 2021 02:46:04 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B358C2193C; Tue, 16 Nov 2021 02:46:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1637030788; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wO0r1Q1BLtXip2CPbwTG/6Ut5fpHeAZt8oAWaKb7z2w=; b=IWtnEJbvd8DYW2oMcEU1fczbOAygAipGuytwF6nHbM20wr4CgabUm8RW5VRkYvt81lKZNZ VUmTwkY1Jpfr7RLnbkSCtjlVGfa0Jqw1yn/udGEQl6NLlI5YgwC/Foc2wtnGgIE7u9qZZU XPQorHS+Lwos+3IA3DXJ3prh6EDzGQU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1637030788; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wO0r1Q1BLtXip2CPbwTG/6Ut5fpHeAZt8oAWaKb7z2w=; b=t4Eg3yrplfyW5VHY+J42FbZmmLbrJOjEWLmzylPbexKrTngyeE9Gr+i7ja0CSR+r5Tq+g0 8rq6txu++cKV+NAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 5E21E13B70; Tue, 16 Nov 2021 02:46:26 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id H8SxB4Ibk2EHCQAAMHmgww (envelope-from ); Tue, 16 Nov 2021 02:46:26 +0000 Subject: [PATCH 08/13] NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 16 Nov 2021 13:44:04 +1100 Message-ID: <163703064455.25805.7846689888578353600.stgit@noble.brown> In-Reply-To: <163702956672.25805.16457749992977493579.stgit@noble.brown> References: <163702956672.25805.16457749992977493579.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Stat-Signature: d8xbpphwh4itnqhzkigi75886jy5e8if Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=IWtnEJbv; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=t4Eg3yrp; spf=pass (imf05.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: AB6AB5092A39 X-HE-Tag: 1637030764-920736 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: NFS_RPC_SWAPFLAGS is only used for READ requests. It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to requests. This is not needed for swap READ - though it is for writes where it is set via a different mechanism. RPC_TASK_ROOTCREDS causes the 'machine' credential to be used. This is not needed as the root credential is saved when the swap file is opened, and this is used for all IO. So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of RPC_TASK_ROOTCREDS, that isn't needed either. Remove both. Signed-off-by: NeilBrown --- fs/nfs/read.c | 4 ---- include/linux/nfs_fs.h | 5 ----- include/linux/sunrpc/sched.h | 1 - include/trace/events/sunrpc.h | 1 - net/sunrpc/auth.c | 2 +- 5 files changed, 1 insertion(+), 12 deletions(-) diff --git a/fs/nfs/read.c b/fs/nfs/read.c index d11af2a9299c..a8f2b884306c 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -194,10 +194,6 @@ static void nfs_initiate_read(struct nfs_pgio_header *hdr, const struct nfs_rpc_ops *rpc_ops, struct rpc_task_setup *task_setup_data, int how) { - struct inode *inode = hdr->inode; - int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0; - - task_setup_data->flags |= swap_flags; rpc_ops->read_setup(hdr, msg); trace_nfs_initiate_read(hdr); } diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 05f249f20f55..5a605e51f4b1 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -45,11 +45,6 @@ */ #define NFS_MAX_TRANSPORTS 16 -/* - * These are the default flags for swap requests - */ -#define NFS_RPC_SWAPFLAGS (RPC_TASK_SWAPPER|RPC_TASK_ROOTCREDS) - /* * Size of the NFS directory verifier */ diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index db964bb63912..56710f8056d3 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -124,7 +124,6 @@ struct rpc_task_setup { #define RPC_TASK_MOVEABLE 0x0004 /* nfs4.1+ rpc tasks */ #define RPC_TASK_NULLCREDS 0x0010 /* Use AUTH_NULL credential */ #define RPC_CALL_MAJORSEEN 0x0020 /* major timeout seen */ -#define RPC_TASK_ROOTCREDS 0x0040 /* force root creds */ #define RPC_TASK_DYNAMIC 0x0080 /* task was kmalloc'ed */ #define RPC_TASK_NO_ROUND_ROBIN 0x0100 /* send requests on "main" xprt */ #define RPC_TASK_SOFT 0x0200 /* Use soft timeouts */ diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 3a99358c262b..141dc34a450e 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -311,7 +311,6 @@ TRACE_EVENT(rpc_request, { RPC_TASK_MOVEABLE, "MOVEABLE" }, \ { RPC_TASK_NULLCREDS, "NULLCREDS" }, \ { RPC_CALL_MAJORSEEN, "MAJORSEEN" }, \ - { RPC_TASK_ROOTCREDS, "ROOTCREDS" }, \ { RPC_TASK_DYNAMIC, "DYNAMIC" }, \ { RPC_TASK_NO_ROUND_ROBIN, "NO_ROUND_ROBIN" }, \ { RPC_TASK_SOFT, "SOFT" }, \ diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index 6bfa19f9fa6a..682fcd24bf43 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -670,7 +670,7 @@ rpcauth_bindcred(struct rpc_task *task, const struct cred *cred, int flags) /* If machine cred couldn't be bound, try a root cred */ if (new) ; - else if (cred == &machine_cred || (flags & RPC_TASK_ROOTCREDS)) + else if (cred == &machine_cred) new = rpcauth_bind_root_cred(task, lookupflags); else if (flags & RPC_TASK_NULLCREDS) new = authnull_ops.lookup_cred(NULL, NULL, 0); From patchwork Tue Nov 16 02:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12621145 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE47BC433F5 for ; Tue, 16 Nov 2021 02:46:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 87AC861BF8 for ; Tue, 16 Nov 2021 02:46:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 87AC861BF8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 20BD26B00AC; Mon, 15 Nov 2021 21:46:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B9AB6B00BA; Mon, 15 Nov 2021 21:46:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A9546B00BB; Mon, 15 Nov 2021 21:46:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id EDD206B00AC for ; Mon, 15 Nov 2021 21:46:39 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id B3808181BA3E2 for ; Tue, 16 Nov 2021 02:46:39 +0000 (UTC) X-FDA: 78813255318.18.849D095 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf06.hostedemail.com (Postfix) with ESMTP id 57C7D801A8A8 for ; Tue, 16 Nov 2021 02:46:38 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 437EA21940; Tue, 16 Nov 2021 02:46:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1637030797; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ff9oRmhTW10neGZIRFtx2mUw/Qy3otVVDUnEPq8/znE=; b=J6o64U2KVKm4XhX9AIVL4KR9EA8iwtu9KQWWRg8saOkXD9aSrZbNhbziYFiSHdOP6okHHG u+L3Vc17G1OTlUPYm/o0MWWs6U0cUNBOBguTgxx82HgTeu66DxsDFndMf8qWHuxzfObV6i AhWRUJrjw/q++idN3nwIka0TOB4KUGY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1637030797; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ff9oRmhTW10neGZIRFtx2mUw/Qy3otVVDUnEPq8/znE=; b=ZRPDR+/JPH1ZqjOuUwxf2xzY5kUonxdG9aiu/m4ahCYfFkEU+CFCXOjGFRXBh1mBT2KsFj lXUS0Hpq6dBkRrBA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E1E7713B70; Tue, 16 Nov 2021 02:46:34 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id mSDBJ4obk2ERCQAAMHmgww (envelope-from ); Tue, 16 Nov 2021 02:46:34 +0000 Subject: [PATCH 09/13] SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 16 Nov 2021 13:44:04 +1100 Message-ID: <163703064455.25805.9135476984725088620.stgit@noble.brown> In-Reply-To: <163702956672.25805.16457749992977493579.stgit@noble.brown> References: <163702956672.25805.16457749992977493579.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Stat-Signature: 33157p5fjnxakquiytua6rw7uj5p6uax Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=J6o64U2K; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="ZRPDR+/J"; spf=pass (imf06.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 57C7D801A8A8 X-HE-Tag: 1637030798-835521 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: rpc tasks can be marked as RPC_TASK_SWAPPER. This causes GFP_MEMALLOC to be used for some allocations. This is needed in some cases, but not in all where it is currently provided, and in some where it isn't provided. Currently *all* tasks associated with a rpc_client on which swap is enabled get the flag and hence some GFP_MEMALLOC support. GFP_MEMALLOC is provided for ->buf_alloc() but only swap-writes need it. However xdr_alloc_bvec does not get GFP_MEMALLOC - though it often does need it. xdr_alloc_bvec is called while the XPRT_LOCK is held. If this blocks, then it blocks all other queued tasks. So this allocation needs GFP_MEMALLOC for *all* requests, not just writes, when the xprt is used for any swap writes. Similarly, if the transport is not connected, that will block all requests including swap writes, so memory allocations should get GFP_MEMALLOC if swap writes are possible. So with this patch: 1/ we ONLY set RPC_TASK_SWAPPER for swap writes. 2/ __rpc_execute() sets PF_MEMALLOC while handling any task with RPC_TASK_SWAPPER set, or when handling any task that holds the XPRT_LOCKED lock on an xprt used for swap. This removes the need for the RPC_IS_SWAPPER() test in ->buf_alloc handlers. 3/ xprt_prepare_transmit() sets PF_MEMALLOC after locking any task to a swapper xprt. __rpc_execute() will clear it. 3/ PF_MEMALLOC is set for all the connect workers. Signed-off-by: NeilBrown --- fs/nfs/write.c | 2 ++ net/sunrpc/clnt.c | 2 -- net/sunrpc/sched.c | 20 +++++++++++++++++--- net/sunrpc/xprt.c | 3 +++ net/sunrpc/xprtrdma/transport.c | 6 ++++-- net/sunrpc/xprtsock.c | 8 ++++++++ 6 files changed, 34 insertions(+), 7 deletions(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 9b7619ce17a7..0c7a304c9e74 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1408,6 +1408,8 @@ static void nfs_initiate_write(struct nfs_pgio_header *hdr, { int priority = flush_task_priority(how); + if (IS_SWAPFILE(hdr->inode)) + task_setup_data->flags |= RPC_TASK_SWAPPER; task_setup_data->priority = priority; rpc_ops->write_setup(hdr, msg, &task_setup_data->rpc_client); trace_nfs_initiate_write(hdr); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 238b2ef5491f..cb76fbea3ed5 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1085,8 +1085,6 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt) task->tk_flags |= RPC_TASK_TIMEOUT; if (clnt->cl_noretranstimeo) task->tk_flags |= RPC_TASK_NO_RETRANS_TIMEOUT; - if (atomic_read(&clnt->cl_swapper)) - task->tk_flags |= RPC_TASK_SWAPPER; /* Add to the client's list of all tasks */ spin_lock(&clnt->cl_lock); list_add_tail(&task->tk_task, &clnt->cl_tasks); diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 256302bf6557..9020cedb7c95 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -869,6 +869,15 @@ void rpc_release_calldata(const struct rpc_call_ops *ops, void *calldata) ops->rpc_release(calldata); } +static bool xprt_needs_memalloc(struct rpc_xprt *xprt, struct rpc_task *tk) +{ + if (!xprt) + return false; + if (!atomic_read(&xprt->swapper)) + return false; + return test_bit(XPRT_LOCKED, &xprt->state) && xprt->snd_task == tk; +} + /* * This is the RPC `scheduler' (or rather, the finite state machine). */ @@ -877,6 +886,7 @@ static void __rpc_execute(struct rpc_task *task) struct rpc_wait_queue *queue; int task_is_async = RPC_IS_ASYNC(task); int status = 0; + unsigned long pflags = current->flags; WARN_ON_ONCE(RPC_IS_QUEUED(task)); if (RPC_IS_QUEUED(task)) @@ -899,6 +909,10 @@ static void __rpc_execute(struct rpc_task *task) } if (!do_action) break; + if (RPC_IS_SWAPPER(task) || + xprt_needs_memalloc(task->tk_xprt, task)) + current->flags |= PF_MEMALLOC; + trace_rpc_task_run_action(task, do_action); do_action(task); @@ -936,7 +950,7 @@ static void __rpc_execute(struct rpc_task *task) rpc_clear_running(task); spin_unlock(&queue->lock); if (task_is_async) - return; + goto out; /* sync task: sleep here */ trace_rpc_task_sync_sleep(task, task->tk_action); @@ -960,6 +974,8 @@ static void __rpc_execute(struct rpc_task *task) /* Release all resources associated with the task */ rpc_release_task(task); +out: + current_restore_flags(pflags, PF_MEMALLOC); } /* @@ -1018,8 +1034,6 @@ int rpc_malloc(struct rpc_task *task) if (RPC_IS_ASYNC(task)) gfp = GFP_NOWAIT | __GFP_NOWARN; - if (RPC_IS_SWAPPER(task)) - gfp |= __GFP_MEMALLOC; size += sizeof(struct rpc_buffer); if (size <= RPC_BUFFER_MAXSIZE) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a0a2583fe941..0614e7463d4b 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1492,6 +1492,9 @@ bool xprt_prepare_transmit(struct rpc_task *task) return false; } + if (atomic_read(&xprt->swapper)) + /* This will be clear in __rpc_execute */ + current->flags |= PF_MEMALLOC; return true; } diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 32df23796747..256b06a92391 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -239,8 +239,11 @@ xprt_rdma_connect_worker(struct work_struct *work) struct rpcrdma_xprt *r_xprt = container_of(work, struct rpcrdma_xprt, rx_connect_worker.work); struct rpc_xprt *xprt = &r_xprt->rx_xprt; + unsigned int pflags = current->flags; int rc; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; rc = rpcrdma_xprt_connect(r_xprt); xprt_clear_connecting(xprt); if (!rc) { @@ -254,6 +257,7 @@ xprt_rdma_connect_worker(struct work_struct *work) rpcrdma_xprt_disconnect(r_xprt); xprt_unlock_connect(xprt, r_xprt); xprt_wake_pending_tasks(xprt, rc); + current_restore_flags(pflags, PF_MEMALLOC); } /** @@ -576,8 +580,6 @@ xprt_rdma_allocate(struct rpc_task *task) flags = RPCRDMA_DEF_GFP; if (RPC_IS_ASYNC(task)) flags = GFP_NOWAIT | __GFP_NOWARN; - if (RPC_IS_SWAPPER(task)) - flags |= __GFP_MEMALLOC; if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize, flags)) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index ae48c9c84ee1..062ff1f37702 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2047,7 +2047,10 @@ static void xs_udp_setup_socket(struct work_struct *work) struct rpc_xprt *xprt = &transport->xprt; struct socket *sock; int status = -EIO; + unsigned int pflags = current->flags; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family, SOCK_DGRAM, IPPROTO_UDP, false); @@ -2067,6 +2070,7 @@ static void xs_udp_setup_socket(struct work_struct *work) xprt_clear_connecting(xprt); xprt_unlock_connect(xprt, transport); xprt_wake_pending_tasks(xprt, status); + current_restore_flags(pflags, PF_MEMALLOC); } /** @@ -2226,7 +2230,10 @@ static void xs_tcp_setup_socket(struct work_struct *work) struct socket *sock = transport->sock; struct rpc_xprt *xprt = &transport->xprt; int status; + unsigned int pflags = current->flags; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; if (!sock) { sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family, SOCK_STREAM, @@ -2291,6 +2298,7 @@ static void xs_tcp_setup_socket(struct work_struct *work) xprt_clear_connecting(xprt); out_unlock: xprt_unlock_connect(xprt, transport); + current_restore_flags(pflags, PF_MEMALLOC); } /** From patchwork Tue Nov 16 02:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12621147 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA4B6C433EF for ; Tue, 16 Nov 2021 02:46:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 52A606187F for ; Tue, 16 Nov 2021 02:46:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 52A606187F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id DF12A6B00BA; Mon, 15 Nov 2021 21:46:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DA11E6B00BB; Mon, 15 Nov 2021 21:46:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1AE06B00BC; Mon, 15 Nov 2021 21:46:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0095.hostedemail.com [216.40.44.95]) by kanga.kvack.org (Postfix) with ESMTP id B0C036B00BA for ; Mon, 15 Nov 2021 21:46:45 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 79A938249980 for ; Tue, 16 Nov 2021 02:46:45 +0000 (UTC) X-FDA: 78813255570.14.2715EB1 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf14.hostedemail.com (Postfix) with ESMTP id 5E56C600198D for ; Tue, 16 Nov 2021 02:46:45 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 8978E1FD6D; Tue, 16 Nov 2021 02:46:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1637030803; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bh1jN8WGzIi8TEk0QRz2HpNXbcNghbMdw1FTZGlL1Tk=; b=IOKHEEPION2gMxq6pdocbPlQV7E2toIftm5puTYicudmr2y13LSJ+3PNCZKabPzv9GYkBy ShDN3Vzj43Xssc825UNXCOZhHuP5r0NjJ1A9y8GpQeGz18U47wpB7BOnCa0nBZwxRvdVk7 PR2+5Mr3kZUsf6D7O0STIeP8q5sSIjA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1637030803; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bh1jN8WGzIi8TEk0QRz2HpNXbcNghbMdw1FTZGlL1Tk=; b=3198S1+iRUtdM6P1zaeNWvfx8gO2+zobdgNjBqrQI5VM/r6t86DSodV8TezxiF7k+9Qgag Ns9zTD98yVSLTKBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 348B613B70; Tue, 16 Nov 2021 02:46:40 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 3QktOZAbk2EWCQAAMHmgww (envelope-from ); Tue, 16 Nov 2021 02:46:40 +0000 Subject: [PATCH 10/13] NFSv4: keep state manager thread active if swap is enabled From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 16 Nov 2021 13:44:04 +1100 Message-ID: <163703064456.25805.6410751202807765149.stgit@noble.brown> In-Reply-To: <163702956672.25805.16457749992977493579.stgit@noble.brown> References: <163702956672.25805.16457749992977493579.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Stat-Signature: jca35xz7h1ez97ry6p79zsntxy6t9eu7 Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=IOKHEEPI; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=3198S1+i; spf=pass (imf14.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 5E56C600198D X-HE-Tag: 1637030805-170706 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If we are swapping over NFSv4, we may not be able to allocate memory to start the state-manager thread at the time when we need it. So keep it always running when swap is enabled, and just signal it to start. This requires updating and testing the cl_swapper count on the root rpc_clnt after following all ->cl_parent links. Signed-off-by: NeilBrown --- fs/nfs/file.c | 19 +++++++++++++++---- fs/nfs/nfs4_fs.h | 1 + fs/nfs/nfs4proc.c | 20 ++++++++++++++++++++ fs/nfs/nfs4state.c | 39 +++++++++++++++++++++++++++++++++------ include/linux/nfs_xdr.h | 2 ++ net/sunrpc/clnt.c | 2 ++ 6 files changed, 73 insertions(+), 10 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 45d8180b7be3..59c271f42ea5 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -489,8 +489,10 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, { unsigned long blocks; long long isize; - struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); - struct inode *inode = file->f_mapping->host; + struct inode *inode = file_inode(file); + struct rpc_clnt *clnt = NFS_CLIENT(inode); + struct nfs_client *cl = NFS_SERVER(inode)->nfs_client; + int ret; spin_lock(&inode->i_lock); blocks = inode->i_blocks; @@ -503,14 +505,23 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, *span = sis->pages; - return rpc_clnt_swap_activate(clnt); + ret = rpc_clnt_swap_activate(clnt); + + if (ret == 0 && cl->rpc_ops->enable_swap) + cl->rpc_ops->enable_swap(inode); + + return ret; } static void nfs_swap_deactivate(struct file *file) { - struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); + struct inode *inode = file_inode(file); + struct rpc_clnt *clnt = NFS_CLIENT(inode); + struct nfs_client *cl = NFS_SERVER(inode)->nfs_client; rpc_clnt_swap_deactivate(clnt); + if (cl->rpc_ops->disable_swap) + cl->rpc_ops->disable_swap(file_inode(file)); } const struct address_space_operations nfs_file_aops = { diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index ed5eaca6801e..8a9ce0f42efd 100644 --- a/fs/nfs/nfs4_fs.h +++ b/fs/nfs/nfs4_fs.h @@ -42,6 +42,7 @@ enum nfs4_client_state { NFS4CLNT_LEASE_MOVED, NFS4CLNT_DELEGATION_EXPIRED, NFS4CLNT_RUN_MANAGER, + NFS4CLNT_MANAGER_AVAILABLE, NFS4CLNT_RECALL_RUNNING, NFS4CLNT_RECALL_ANY_LAYOUT_READ, NFS4CLNT_RECALL_ANY_LAYOUT_RW, diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index ee3bc79f6ca3..ab6382f9cbf0 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -10347,6 +10347,24 @@ static ssize_t nfs4_listxattr(struct dentry *dentry, char *list, size_t size) return error + error2 + error3; } +static void nfs4_enable_swap(struct inode *inode) +{ + /* The state manager thread must always be running. + * It will notice the client is a swapper, and stay put. + */ + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client; + + nfs4_schedule_state_manager(clp); +} + +static void nfs4_disable_swap(struct inode *inode) +{ + /* The state manager thread will now exit once it is + * woken. + */ + wake_up_var(&NFS_SERVER(inode)->nfs_client->cl_state); +} + static const struct inode_operations nfs4_dir_inode_operations = { .create = nfs_create, .lookup = nfs_lookup, @@ -10423,6 +10441,8 @@ const struct nfs_rpc_ops nfs_v4_clientops = { .free_client = nfs4_free_client, .create_server = nfs4_create_server, .clone_server = nfs_clone_server, + .enable_swap = nfs4_enable_swap, + .disable_swap = nfs4_disable_swap, }; static const struct xattr_handler nfs4_xattr_nfs4_acl_handler = { diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index ecc4594299d6..2408e9b98f68 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1205,10 +1205,17 @@ void nfs4_schedule_state_manager(struct nfs_client *clp) { struct task_struct *task; char buf[INET6_ADDRSTRLEN + sizeof("-manager") + 1]; + struct rpc_clnt *cl = clp->cl_rpcclient; + + while (cl != cl->cl_parent) + cl = cl->cl_parent; set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state); - if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0) + if (test_and_set_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state) != 0) { + wake_up_var(&clp->cl_state); return; + } + set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state); __module_get(THIS_MODULE); refcount_inc(&clp->cl_count); @@ -1224,6 +1231,7 @@ void nfs4_schedule_state_manager(struct nfs_client *clp) printk(KERN_ERR "%s: kthread_run: %ld\n", __func__, PTR_ERR(task)); nfs4_clear_state_manager_bit(clp); + clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state); nfs_put_client(clp); module_put(THIS_MODULE); } @@ -2661,11 +2669,8 @@ static void nfs4_state_manager(struct nfs_client *clp) clear_bit(NFS4CLNT_RECALL_RUNNING, &clp->cl_state); } - /* Did we race with an attempt to give us more work? */ - if (!test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state)) - return; - if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0) - return; + return; + } while (refcount_read(&clp->cl_count) > 1 && !signalled()); goto out_drain; @@ -2685,9 +2690,31 @@ static void nfs4_state_manager(struct nfs_client *clp) static int nfs4_run_state_manager(void *ptr) { struct nfs_client *clp = ptr; + struct rpc_clnt *cl = clp->cl_rpcclient; + + while (cl != cl->cl_parent) + cl = cl->cl_parent; allow_signal(SIGKILL); +again: + set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state); nfs4_state_manager(clp); + if (atomic_read(&cl->cl_swapper)) { + wait_var_event_interruptible(&clp->cl_state, + test_bit(NFS4CLNT_RUN_MANAGER, + &clp->cl_state)); + if (atomic_read(&cl->cl_swapper) && + test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state)) + goto again; + /* Either no longer a swapper, or were signalled */ + } + clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state); + + if (refcount_read(&clp->cl_count) > 1 && !signalled() && + test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) && + !test_and_set_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state)) + goto again; + nfs_put_client(clp); module_put_and_exit(0); return 0; diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 967a0098f0a9..04cf3a8fb949 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1795,6 +1795,8 @@ struct nfs_rpc_ops { struct nfs_server *(*create_server)(struct fs_context *); struct nfs_server *(*clone_server)(struct nfs_server *, struct nfs_fh *, struct nfs_fattr *, rpc_authflavor_t); + void (*enable_swap)(struct inode *inode); + void (*disable_swap)(struct inode *inode); }; /* diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index cb76fbea3ed5..4cb403a0f334 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -3066,6 +3066,8 @@ rpc_clnt_swap_activate_callback(struct rpc_clnt *clnt, int rpc_clnt_swap_activate(struct rpc_clnt *clnt) { + while (clnt != clnt->cl_parent) + clnt = clnt->cl_parent; if (atomic_inc_return(&clnt->cl_swapper) == 1) return rpc_clnt_iterate_for_each_xprt(clnt, rpc_clnt_swap_activate_callback, NULL); From patchwork Tue Nov 16 02:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12621149 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71CD4C433EF for ; Tue, 16 Nov 2021 02:46:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1ED416322A for ; Tue, 16 Nov 2021 02:46:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1ED416322A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B5AA66B00AF; Mon, 15 Nov 2021 21:46:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B09816B00BB; Mon, 15 Nov 2021 21:46:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 983A56B00BC; Mon, 15 Nov 2021 21:46:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0090.hostedemail.com [216.40.44.90]) by kanga.kvack.org (Postfix) with ESMTP id 88A566B00AF for ; Mon, 15 Nov 2021 21:46:54 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 51E438249980 for ; Tue, 16 Nov 2021 02:46:54 +0000 (UTC) X-FDA: 78813255906.17.339377B Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf03.hostedemail.com (Postfix) with ESMTP id 18DE230000B7 for ; Tue, 16 Nov 2021 02:46:43 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C08E01FD6E; Tue, 16 Nov 2021 02:46:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1637030812; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ilqBwRNcKCWrWikYfICWiUGsbiDnIfpp6dIzn05L/u0=; b=bxOgH2KDZq5WM3FoP6Hh8QNS3dDmYdj+VdHSkOCn5TaCua9nhQnZJ5j0OMTAs7i9ksqJfQ 7r2QXkHt7QPW4DPKehojOONQMslkVY3Bs2DqimQIn7QxwvcxtVtGafL9N0oUEMggYfIW6e induzPCcnkKaCwsfjr5mBC7mqtYGRSE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1637030812; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ilqBwRNcKCWrWikYfICWiUGsbiDnIfpp6dIzn05L/u0=; b=J6AaypphTXdJwcS+m7MJGN9qLmx78KXfjbO2OjSvoBgV5QKOlL2pGzUnKSOEtDgXjBZvFh JDE7foDThmS2U/Aw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6A52913B70; Tue, 16 Nov 2021 02:46:50 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id t4CfCpobk2EmCQAAMHmgww (envelope-from ); Tue, 16 Nov 2021 02:46:50 +0000 Subject: [PATCH 11/13] NFS: swap-out must always use STABLE writes. From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 16 Nov 2021 13:44:04 +1100 Message-ID: <163703064457.25805.5324231084334985723.stgit@noble.brown> In-Reply-To: <163702956672.25805.16457749992977493579.stgit@noble.brown> References: <163702956672.25805.16457749992977493579.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 18DE230000B7 X-Stat-Signature: 5jygroebsadcig3nicp4rfxhx9a6bnxb Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=bxOgH2KD; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=J6Aaypph; spf=pass (imf03.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1637030803-805241 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The commit handling code is not safe against memory-pressure deadlocks when writing to swap. In particular, nfs_commitdata_alloc() blocks indefinitely waiting for memory, and this can consume all available workqueue threads. swap-out most likely uses STABLE writes anyway as COND_STABLE indicates that a stable write should be used if the write fits in a single request, and it normally does. However if we ever swap with a small wsize, or gather unusually large numbers of pages for a single write, this might change. For safety, make it explicit in the code that direct writes use for swap must always use FLUSH_COND_STABLE. Signed-off-by: NeilBrown --- fs/nfs/direct.c | 12 +++++++----- fs/nfs/file.c | 2 +- include/linux/nfs_fs.h | 3 ++- 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 1e80d243ba25..8d3b12402725 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -173,7 +173,7 @@ ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) if (iov_iter_rw(iter) == READ) return nfs_file_direct_read(iocb, iter); - return nfs_file_direct_write(iocb, iter); + return nfs_file_direct_write(iocb, iter, FLUSH_STABLE); } static void nfs_direct_release_pages(struct page **pages, unsigned int npages) @@ -789,7 +789,7 @@ static const struct nfs_pgio_completion_ops nfs_direct_write_completion_ops = { */ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, struct iov_iter *iter, - loff_t pos) + loff_t pos, int ioflags) { struct nfs_pageio_descriptor desc; struct inode *inode = dreq->inode; @@ -797,7 +797,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, size_t requested_bytes = 0; size_t wsize = max_t(size_t, NFS_SERVER(inode)->wsize, PAGE_SIZE); - nfs_pageio_init_write(&desc, inode, FLUSH_COND_STABLE, false, + nfs_pageio_init_write(&desc, inode, ioflags, false, &nfs_direct_write_completion_ops); desc.pg_dreq = dreq; get_dreq(dreq); @@ -875,6 +875,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * nfs_file_direct_write - file direct write operation for NFS files * @iocb: target I/O control block * @iter: vector of user buffers from which to write data + * @ioflags: flags for nfs_pageio_init_write() * * We use this function for direct writes instead of calling * generic_file_aio_write() in order to avoid taking the inode @@ -891,7 +892,8 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * Note that O_APPEND is not supported for NFS direct writes, as there * is no atomic O_APPEND write facility in the NFS protocol. */ -ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) +ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, + int ioflags) { ssize_t result, requested; size_t count; @@ -935,7 +937,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) nfs_start_io_direct(inode); - requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos, ioflags); if (mapping->nrpages) { invalidate_inode_pages2_range(mapping, diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 59c271f42ea5..878a6a510a5e 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -630,7 +630,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) result = generic_write_checks(iocb, from); if (result <= 0) return result; - return nfs_file_direct_write(iocb, from); + return nfs_file_direct_write(iocb, from, FLUSH_COND_STABLE); } dprintk("NFS: write(%pD2, %zu@%Ld)\n", diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 5a605e51f4b1..ca312aea6bec 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -509,7 +509,8 @@ extern ssize_t nfs_direct_IO(struct kiocb *, struct iov_iter *); extern ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter); extern ssize_t nfs_file_direct_write(struct kiocb *iocb, - struct iov_iter *iter); + struct iov_iter *iter, + int ioflags); /* * linux/fs/nfs/dir.c From patchwork Tue Nov 16 02:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12621151 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46FB7C433F5 for ; Tue, 16 Nov 2021 02:47:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E8DEE61C15 for ; Tue, 16 Nov 2021 02:47:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E8DEE61C15 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 7FEFF6B00BB; Mon, 15 Nov 2021 21:47:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7ACD36B00BC; Mon, 15 Nov 2021 21:47:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64F206B00BD; Mon, 15 Nov 2021 21:47:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0240.hostedemail.com [216.40.44.240]) by kanga.kvack.org (Postfix) with ESMTP id 5530E6B00BB for ; Mon, 15 Nov 2021 21:47:04 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 2122075961 for ; Tue, 16 Nov 2021 02:47:04 +0000 (UTC) X-FDA: 78813256368.23.BC856BA Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf09.hostedemail.com (Postfix) with ESMTP id 77F2E3000106 for ; Tue, 16 Nov 2021 02:47:03 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 648AF2191A; Tue, 16 Nov 2021 02:47:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1637030822; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5S+tNHKYQRjSQpQ0rt9HAN4Yso3tuzIn6GjWJ7BDcbM=; b=RiR+NzovbNVimxXKnbO8sg+DB9mcHvi+oRfRQufGEKJA+HSI41DLd7/p8q7wjVwvK083Vj U+2Lq3jN9iy3FDtRDL+kknUUPQGwezjiXSVpsUDK18wVcKllnh8rQRMIvSJJISD8ewaP0s wttat1fAXecoSOuoGTG/M4gyhvrpqDo= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1637030822; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5S+tNHKYQRjSQpQ0rt9HAN4Yso3tuzIn6GjWJ7BDcbM=; b=lDDHnMQnFJWN7BiK5p99csvV+JYZ9vVilNR0Qfa1eWNMUOemxSUgy1qksLpeBzfz6hGmKn rSXnnqtgDrv+MdAg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 01AEE13B70; Tue, 16 Nov 2021 02:46:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 7pekLKMbk2E0CQAAMHmgww (envelope-from ); Tue, 16 Nov 2021 02:46:59 +0000 Subject: [PATCH 12/13] MM: use AIO/DIO for reads from SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 16 Nov 2021 13:44:04 +1100 Message-ID: <163703064458.25805.6777856691611196478.stgit@noble.brown> In-Reply-To: <163702956672.25805.16457749992977493579.stgit@noble.brown> References: <163702956672.25805.16457749992977493579.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 77F2E3000106 X-Stat-Signature: sghs9hzozdy17u1qomhhgg88rcidr7sc Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=RiR+Nzov; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=lDDHnMQn; spf=pass (imf09.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1637030823-84051 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When pages a read from SWP_FS_OPS swap-space, the reads are submitted as separate reads for each page. This is generally less efficient than larger reads. We can use the block-plugging infrastructure to delay submitting the read request until multiple contigious pages have been collected. This requires using ->direct_IO to submit the read (as ->readpages isn't suitable for swap). If the caller schedules before unplugging, we hand the read-in task off to systemwq to avoid any possible locking issues. Signed-off-by: NeilBrown --- mm/page_io.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 103 insertions(+), 4 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 9725c7e1eeea..30d613881995 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -282,6 +282,14 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page) #define bio_associate_blkg_from_page(bio, page) do { } while (0) #endif /* CONFIG_MEMCG && CONFIG_BLK_CGROUP */ +struct swap_iocb { + struct blk_plug_cb cb; /* Must be first */ + struct kiocb iocb; + struct bio_vec bvec[SWAP_CLUSTER_MAX]; + struct work_struct work; + int pages; +}; + int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func) { @@ -353,6 +361,59 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return 0; } +static void sio_read_complete(struct kiocb *iocb, long ret) +{ + struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); + int p; + + for (p = 0; p < sio->pages; p++) { + struct page *page = sio->bvec[p].bv_page; + + if (ret != PAGE_SIZE * sio->pages) { + SetPageError(page); + ClearPageUptodate(page); + pr_alert_ratelimited("Read-error on swap-device\n"); + } else { + SetPageUptodate(page); + count_vm_event(PSWPIN); + } + unlock_page(page); + } + kfree(sio); +} + +static void sio_read_unplug(struct blk_plug_cb *cb, bool from_schedule); + +static void sio_read_unplug_worker(struct work_struct *work) +{ + struct swap_iocb *sio = container_of(work, struct swap_iocb, work); + sio_read_unplug(&sio->cb, 0); +} + +static void sio_read_unplug(struct blk_plug_cb *cb, bool from_schedule) +{ + struct swap_iocb *sio = container_of(cb, struct swap_iocb, cb); + struct address_space *mapping = sio->iocb.ki_filp->f_mapping; + struct iov_iter from; + unsigned int nofs_flag; + int ret; + + if (from_schedule) { + INIT_WORK(&sio->work, sio_read_unplug_worker); + schedule_work(&sio->work); + return; + } + + iov_iter_bvec(&from, READ, sio->bvec, + sio->pages, PAGE_SIZE * sio->pages); + /* nofs needs as ->direct_IO may take the same mutex it takes for write */ + nofs_flag = memalloc_nofs_save(); + ret = mapping->a_ops->direct_IO(&sio->iocb, &from); + memalloc_nofs_restore(nofs_flag); + if (ret != -EIOCBQUEUED) + sio_read_complete(&sio->iocb, ret); +} + int swap_readpage(struct page *page, bool synchronous) { struct bio *bio; @@ -380,10 +441,48 @@ int swap_readpage(struct page *page, bool synchronous) if (data_race(sis->flags & SWP_FS_OPS)) { struct file *swap_file = sis->swap_file; struct address_space *mapping = swap_file->f_mapping; - - ret = mapping->a_ops->readpage(swap_file, page); - if (!ret) - count_vm_event(PSWPIN); + struct blk_plug_cb *cb; + struct swap_iocb *sio; + loff_t pos = page_file_offset(page); + struct blk_plug plug; + int p; + + /* We are sometimes called without a plug active. + * By calling blk_start_plug() here, we ensure blk_check_plugged + * only fails if memory allocation fails. + */ + blk_start_plug(&plug); + cb = blk_check_plugged(sio_read_unplug, swap_file, sizeof(*sio)); + sio = container_of(cb, struct swap_iocb, cb); + if (cb && sio->pages && + sio->iocb.ki_pos + sio->pages * PAGE_SIZE != pos) { + /* Not contiguous - hide this sio from lookup */ + cb->data = NULL; + cb = blk_check_plugged(sio_read_unplug, swap_file, + sizeof(*sio)); + sio = container_of(cb, struct swap_iocb, cb); + } + if (!cb) { + blk_finish_plug(&plug); + ret = mapping->a_ops->readpage(swap_file, page); + if (!ret) + count_vm_event(PSWPIN); + goto out; + } + if (sio->pages == 0) { + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_pos = pos; + sio->iocb.ki_complete = sio_read_complete; + } + p = sio->pages; + sio->bvec[p].bv_page = page; + sio->bvec[p].bv_len = PAGE_SIZE; + sio->bvec[p].bv_offset = 0; + p += 1; + sio->pages = p; + if (p == ARRAY_SIZE(sio->bvec)) + cb->data = NULL; + blk_finish_plug(&plug); goto out; } From patchwork Tue Nov 16 02:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12621153 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 907A0C433EF for ; Tue, 16 Nov 2021 02:47:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3B6B961BF8 for ; Tue, 16 Nov 2021 02:47:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3B6B961BF8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id CB8626B00BF; Mon, 15 Nov 2021 21:47:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C676D6B00C0; Mon, 15 Nov 2021 21:47:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B30BE6B00C1; Mon, 15 Nov 2021 21:47:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0192.hostedemail.com [216.40.44.192]) by kanga.kvack.org (Postfix) with ESMTP id A34FA6B00BF for ; Mon, 15 Nov 2021 21:47:16 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 62C03181BF883 for ; Tue, 16 Nov 2021 02:47:16 +0000 (UTC) X-FDA: 78813256872.22.60868E8 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf07.hostedemail.com (Postfix) with ESMTP id 4A12010000BA for ; Tue, 16 Nov 2021 02:47:15 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 335571FD6C; Tue, 16 Nov 2021 02:47:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1637030834; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VVPL+cBsnPcJ/XftWmdbCLVfibDIhJVueSGjUUmK5KU=; b=YUNvL1vzNP6tc4e++3cILebMtiqR5ZyisD+djv+vdV3l2KuqbnoTXfP/nevxDhMdNMbfhA +jyfQOjL4Q6wbJcV8mMQ7Z3CEZiE/h/FB/FxUspS2/4/WQ5Pyp7GdQjb9zwpgUWkKugyYR Itw+wMJ6/XhOxbz2YvQgR/FrGhUR/Xo= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1637030834; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VVPL+cBsnPcJ/XftWmdbCLVfibDIhJVueSGjUUmK5KU=; b=xpPWJwj/GMiQ5fbX4bXE8GDIPorWPxEGI7Ja2YAVjUnYpTh72JtPZuQYrmI2JOOLWtGgYE fqaBhEG90muNWrDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D0F8313B70; Tue, 16 Nov 2021 02:47:11 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id XXShI68bk2FMCQAAMHmgww (envelope-from ); Tue, 16 Nov 2021 02:47:11 +0000 Subject: [PATCH 13/13] MM: use AIO for DIO writes to swap From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 16 Nov 2021 13:44:04 +1100 Message-ID: <163703064458.25805.5272714590032323298.stgit@noble.brown> In-Reply-To: <163702956672.25805.16457749992977493579.stgit@noble.brown> References: <163702956672.25805.16457749992977493579.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=YUNvL1vz; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="xpPWJwj/"; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf07.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 4A12010000BA X-Stat-Signature: dsrbknjgjmhsxokb9ii1bfsgfim9daga X-HE-Tag: 1637030835-273479 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When swap-out goes through the filesystem (as with NFS), we currently perform synchronous writes with ->direct_IO. This serializes swap writes and causes kswapd to block waiting for a writes to complete. This is quite different to swap-out to a block device (always async), and possibly hurts liveness. So switch to AIO writes. If the necessary kiocb structure cannot be allocated, fall back to sync writes using a kiocb on the stack. Signed-off-by: NeilBrown --- mm/page_io.c | 136 ++++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 103 insertions(+), 33 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 30d613881995..59a2d49e53c3 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -25,6 +25,7 @@ #include #include #include +#include "internal.h" void end_swap_bio_write(struct bio *bio) { @@ -288,8 +289,70 @@ struct swap_iocb { struct bio_vec bvec[SWAP_CLUSTER_MAX]; struct work_struct work; int pages; + bool on_stack; }; +static void sio_aio_complete(struct kiocb *iocb, long ret) +{ + struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); + int p; + + if (ret != PAGE_SIZE * sio->pages) { + /* + * In the case of swap-over-nfs, this can be a + * temporary failure if the system has limited + * memory for allocating transmit buffers. + * Mark the page dirty and avoid + * rotate_reclaimable_page but rate-limit the + * messages but do not flag PageError like + * the normal direct-to-bio case as it could + * be temporary. + */ + pr_err_ratelimited("Write error on dio swapfile (%llu - %d pages)\n", + page_file_offset(sio->bvec[0].bv_page), + sio->pages); + for (p = 0; p < sio->pages; p++) { + set_page_dirty(sio->bvec[p].bv_page); + ClearPageReclaim(sio->bvec[p].bv_page); + } + } + for (p = 0; p < sio->pages; p++) + end_page_writeback(sio->bvec[p].bv_page); + if (!sio->on_stack) + kfree(sio); +} + +static void sio_aio_unplug(struct blk_plug_cb *cb, bool from_schedule); + +static void sio_write_unplug_worker(struct work_struct *work) +{ + struct swap_iocb *sio = container_of(work, struct swap_iocb, work); + sio_aio_unplug(&sio->cb, 0); +} + +static void sio_aio_unplug(struct blk_plug_cb *cb, bool from_schedule) +{ + struct swap_iocb *sio = container_of(cb, struct swap_iocb, cb); + struct address_space *mapping = sio->iocb.ki_filp->f_mapping; + struct iov_iter from; + int ret; + unsigned int noreclaim_flag; + + if (from_schedule) { + INIT_WORK(&sio->work, sio_write_unplug_worker); + queue_work(mm_percpu_wq, &sio->work); + return; + } + + noreclaim_flag = memalloc_noreclaim_save(); + iov_iter_bvec(&from, WRITE, sio->bvec, + sio->pages, PAGE_SIZE * sio->pages); + ret = mapping->a_ops->direct_IO(&sio->iocb, &from); + memalloc_noreclaim_restore(noreclaim_flag); + if (ret != -EIOCBQUEUED) + sio_aio_complete(&sio->iocb, ret); +} + int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func) { @@ -299,44 +362,51 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, VM_BUG_ON_PAGE(!PageSwapCache(page), page); if (data_race(sis->flags & SWP_FS_OPS)) { - struct kiocb kiocb; + struct swap_iocb *sio, sio_on_stack; + struct blk_plug_cb *cb; struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - struct bio_vec bv = { - .bv_page = page, - .bv_len = PAGE_SIZE, - .bv_offset = 0 - }; - struct iov_iter from; - - iov_iter_bvec(&from, WRITE, &bv, 1, PAGE_SIZE); - init_sync_kiocb(&kiocb, swap_file); - kiocb.ki_pos = page_file_offset(page); + loff_t pos = page_file_offset(page); + int p; set_page_writeback(page); unlock_page(page); - ret = mapping->a_ops->direct_IO(&kiocb, &from); - if (ret == PAGE_SIZE) { - count_vm_event(PSWPOUT); - ret = 0; - } else { - /* - * In the case of swap-over-nfs, this can be a - * temporary failure if the system has limited - * memory for allocating transmit buffers. - * Mark the page dirty and avoid - * folio_rotate_reclaimable but rate-limit the - * messages but do not flag PageError like - * the normal direct-to-bio case as it could - * be temporary. - */ - set_page_dirty(page); - ClearPageReclaim(page); - pr_err_ratelimited("Write error on dio swapfile (%llu)\n", - page_file_offset(page)); + cb = blk_check_plugged(sio_aio_unplug, swap_file, sizeof(*sio)); + sio = container_of(cb, struct swap_iocb, cb); + if (cb && sio->pages && + sio->iocb.ki_pos + sio->pages * PAGE_SIZE != pos) { + /* Not contiguous - hide this sio from lookup */ + cb->data = NULL; + cb = blk_check_plugged(sio_aio_unplug, swap_file, + sizeof(*sio)); + sio = container_of(cb, struct swap_iocb, cb); } - end_page_writeback(page); - return ret; + if (!cb) { + sio = &sio_on_stack; + sio->pages = 0; + sio->on_stack = true; + } + + if (sio->pages == 0) { + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_pos = pos; + if (sio != &sio_on_stack) + sio->iocb.ki_complete = sio_aio_complete; + } + p = sio->pages; + sio->bvec[p].bv_page = page; + sio->bvec[p].bv_len = PAGE_SIZE; + sio->bvec[p].bv_offset = 0; + p += 1; + sio->pages = p; + if (!cb) + sio_aio_unplug(&sio->cb, 0); + else if (p >= ARRAY_SIZE(sio->bvec)) + /* Don't try to add to this */ + cb->data = NULL; + + count_vm_event(PSWPOUT); + + return 0; } ret = bdev_write_page(sis->bdev, swap_page_sector(page), page, wbc);