From patchwork Thu Feb 9 12:31:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 13134491 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD500C64EC5 for ; Thu, 9 Feb 2023 12:32:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229646AbjBIMc2 (ORCPT ); Thu, 9 Feb 2023 07:32:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230106AbjBIMcM (ORCPT ); Thu, 9 Feb 2023 07:32:12 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 647F32CFF6; Thu, 9 Feb 2023 04:32:10 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id CE13E5CD40; Thu, 9 Feb 2023 12:32:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1675945928; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gBw2mKwxwDgZ9MmEYLF3uXUxcpgEyk/w50QsDQavJcg=; b=v6GgDNU+MhJNjuIqZeR2GDsqbgKAL5N53IszLSSmggeM4zWNvrhaMVw7QdntGN6x6mkRZ4 kHhGE+YtFiK9Oez/9T0osX0D3TCULzLsCgB5ryyLSeDxbOEI9ial3c4TZA84HxVmgpmWVw qmsBc3DkBJWQcGVS6ocWXufIjhOANcI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1675945928; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gBw2mKwxwDgZ9MmEYLF3uXUxcpgEyk/w50QsDQavJcg=; b=FXzFE75rONNstK0dAb8pdabXTcPsgPdOf7HLiGsW7rBJYR8BdI7BtAupJeqU/W6LhEqadu vwLD/y6aTFXNBgCA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id BF39D13A1F; Thu, 9 Feb 2023 12:32:08 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id C4+tLsjn5GO/WQAAMHmgww (envelope-from ); Thu, 09 Feb 2023 12:32:08 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 8147AA06E1; Thu, 9 Feb 2023 13:32:06 +0100 (CET) From: Jan Kara To: Cc: , , John Hubbard , David Howells , David Hildenbrand , Jan Kara Subject: [PATCH 1/5] mm: Do not reclaim private data from pinned page Date: Thu, 9 Feb 2023 13:31:53 +0100 Message-Id: <20230209123206.3548-1-jack@suse.cz> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230209121046.25360-1-jack@suse.cz> References: <20230209121046.25360-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1241; i=jack@suse.cz; h=from:subject; bh=b7iWQ5YfsZfndhNXL2+FDh805OZp7+Jpf46CR42PtdM=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBj5Oe57YRO7wzPJ42BfBm4SCprTnrfRpao4H5EH7xY gEZK6vuJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCY+TnuQAKCRCcnaoHP2RA2eMkCA DQnl4xH8vfdjSnutnULcqLL8Kq9ezcxTZO+bmhLzhnB7NAUjaACISpKijC086nmf0oiqZyqw90ylDw xpHWrFqToRVH1JrPM/P4aQEtCx8JGPO5utUYKImRrVa/2yLIWj1HVg7OAvgdhRpyt8U7C+nNQD1CPu NWNp5FUzSRyME02NaLU7Aji4OJTrN2fgIBFy1GX589g5nZBQDVMZI/NoY53I5pZk0d3h2yIwZJQt8L n2+UtOXc93bEka4NlcXi96gSH0OVyPtA5HivV03nc666TdEAf9DJC3TfyXp+QMB1Q+da4btV0UZS/j isheyYFhZduxQ4X6KGmoKYo7fny8/8 X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org If the page is pinned, there's no point in trying to reclaim it. Furthermore if the page is from the page cache we don't want to reclaim fs-private data from the page because the pinning process may be writing to the page at any time and reclaiming fs private info on a dirty page can upset the filesystem (see link below). Link: https://lore.kernel.org/linux-mm/20180103100430.GE4911@quack2.suse.cz Signed-off-by: Jan Kara --- mm/vmscan.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/mm/vmscan.c b/mm/vmscan.c index bf3eedf0209c..ab3911a8b116 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1901,6 +1901,16 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, } } + /* + * Folio is unmapped now so it cannot be newly pinned anymore. + * No point in trying to reclaim folio if it is pinned. + * Furthermore we don't want to reclaim underlying fs metadata + * if the folio is pinned and thus potentially modified by the + * pinning process is that may upset the filesystem. + */ + if (folio_maybe_dma_pinned(folio)) + goto activate_locked; + mapping = folio_mapping(folio); if (folio_test_dirty(folio)) { /* From patchwork Thu Feb 9 12:31:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 13134490 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1899CC05027 for ; Thu, 9 Feb 2023 12:32:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229518AbjBIMc0 (ORCPT ); Thu, 9 Feb 2023 07:32:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230150AbjBIMcM (ORCPT ); Thu, 9 Feb 2023 07:32:12 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 644302A16F; Thu, 9 Feb 2023 04:32:10 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id BC86C37476; Thu, 9 Feb 2023 12:32:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1675945928; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y67/jfZcaQ1t6XqJfCmKWgTl2+zDLO5vmdBow0/F1TM=; b=pcl1MoPjJrqtKQ7VmePmRZ7GSAV/t0ZTMCVMmqRZCJ85r+TM8FrAlpGX/+nAs3me2AXnYb +oke6O/NQry6pTkbUjFbiVswTcGKk9ZQFYdh4OHfafnnrpGnZ1pej1bEf6SbTjy7BfGODI svinGNwGJ7TU+ao9NGN2AirHgWvY8zk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1675945928; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y67/jfZcaQ1t6XqJfCmKWgTl2+zDLO5vmdBow0/F1TM=; b=PDjlgzwzWa9eJisya0ohOe7XJZPKSBlvylyTL+RO3mNo9e0ewsIVxbK1yEk9GtbxNYZrsO 2/GoNszJEVhzIVDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id AF77E138E4; Thu, 9 Feb 2023 12:32:08 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id YAcqKsjn5GO4WQAAMHmgww (envelope-from ); Thu, 09 Feb 2023 12:32:08 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 86927A06E2; Thu, 9 Feb 2023 13:32:06 +0100 (CET) From: Jan Kara To: Cc: , , John Hubbard , David Howells , David Hildenbrand , Jan Kara Subject: [PATCH 2/5] ext4: Drop workaround for mm reclaiming fs private page data Date: Thu, 9 Feb 2023 13:31:54 +0100 Message-Id: <20230209123206.3548-2-jack@suse.cz> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230209121046.25360-1-jack@suse.cz> References: <20230209121046.25360-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1260; i=jack@suse.cz; h=from:subject; bh=h6csu2b8ZDggoYb6LWh5Isk6fkPmNiQSaSJRB+AFypk=; b=owGbwMvMwME4Z+4qdvsUh5uMp9WSGJKfPN911EVb7Lxqrw/bhdnTl4YmLDJx6uvdmPG5OEUtKpjn 1MIbnYzGLAyMHAyyYoosqyMval+bZ9S1NVRDBmYQKxPIFAYuTgGYyF9z9p+My7lNKjr+msd0J2bUzV h/9LhvRJ1jtImdb/yynSvc5vnoCr18rjdflL2vn/Mp7+Udld06EQ1v1+9nL6zZ/Mxgf5/4idOvs24r 2yrlKHs/PvMyhlf74w9l2b73W8L8Pl3s+G+y886HMN7rqfNkjD+ltn/LWf26t07ENUBt760Vjqq2Md dqbRhe62mpCqnuqrvqIW6ukLn9psn9tJ2/eARvBzQ6TmvRtNhlys2eIdOxxXNKsQxH8WL5T31OUS2L PivPPbkmnj/N4vtW9uhH7QoFz241XCjfOO3l34TDv0Qi5na15usteDHlky9XVoy80GlfqUeMLH0f/B QC22VLkxRdNq4oPX29ybHkxLY5AA== X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Drop workaround in ext4 writeback code to handle a situation when MM reclaims fs-private page data from a page that is (or becomes) dirty. After the previous commit this should not happen anymore. Signed-off-by: Jan Kara --- fs/ext4/inode.c | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9d9f414f99fe..46078651ce32 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2657,22 +2657,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) wait_on_page_writeback(page); BUG_ON(PageWriteback(page)); - /* - * Should never happen but for buggy code in - * other subsystems that call - * set_page_dirty() without properly warning - * the file system first. See [1] for more - * information. - * - * [1] https://lore.kernel.org/linux-mm/20180103100430.GE4911@quack2.suse.cz - */ - if (!page_has_buffers(page)) { - ext4_warning_inode(mpd->inode, "page %lu does not have buffers attached", page->index); - ClearPageDirty(page); - unlock_page(page); - continue; - } - if (mpd->map.m_len == 0) mpd->first_page = page->index; mpd->next_page = page->index + 1; From patchwork Thu Feb 9 12:31:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 13134495 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E571C64ED6 for ; Thu, 9 Feb 2023 12:32:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229675AbjBIMc2 (ORCPT ); Thu, 9 Feb 2023 07:32:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230167AbjBIMcQ (ORCPT ); Thu, 9 Feb 2023 07:32:16 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 647022B629; Thu, 9 Feb 2023 04:32:10 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id BCCD05CD3E; Thu, 9 Feb 2023 12:32:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1675945928; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BIVy/KuL6g/vo7B3GdY0//P3KeKhKB5lViH2xRs5ty0=; b=Zre6VeWjTeCtrShgQfH89eJL16vS9CIvftlVUvUAjAZqfL4aDQGRrK6dmB+30g4nLDVK/3 C0GOxQH9LgMrCfFFeskXvqu8HXY95n7x5CQj8fgHHSmP7VriwQg9Jgsuzj7H5OMtAAnpMx BeYBDffiGmkvPxM4Y1+UHguzzkBYUpA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1675945928; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BIVy/KuL6g/vo7B3GdY0//P3KeKhKB5lViH2xRs5ty0=; b=ZzoAo7prSZG2dRsrtPw+9bdfqAsg69nQSC8iYB+KcdJnOrLydw+lNDs1eR5l++/0Lx+6pg rjciXxcLp+SYG/CQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A64CE13915; Thu, 9 Feb 2023 12:32:08 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id XJqTKMjn5GO1WQAAMHmgww (envelope-from ); Thu, 09 Feb 2023 12:32:08 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 8D02DA06E3; Thu, 9 Feb 2023 13:32:06 +0100 (CET) From: Jan Kara To: Cc: , , John Hubbard , David Howells , David Hildenbrand , Jan Kara Subject: [PATCH 3/5] mm: Do not try to write pinned folio during memory cleaning writeback Date: Thu, 9 Feb 2023 13:31:55 +0100 Message-Id: <20230209123206.3548-3-jack@suse.cz> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230209121046.25360-1-jack@suse.cz> References: <20230209121046.25360-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=24971; i=jack@suse.cz; h=from:subject; bh=y3H+GOLrDAW9dB63NteYUrKsdQl7J0M3vk0dPgRD2J8=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBj5Oe7dgtajS4zcK79M4BganpIO0XHXhJ6UNa4sASP AZ6OJzyJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCY+TnuwAKCRCcnaoHP2RA2bRCB/ 9MhOC8vb6EWPDXLTxP9SL2ydUEn59OqnwQq8j7AG/ym+6PVh6ge5JNQRK9TOTOiGVUv+SHRpbDuT1U KPSE0htkywEtJVEn7DKtJZW9o1y5xMVg+pgueeojm6FwYPpUToFnY1TFakfV9+CIH9IhHXSBJ5uNAb XknK6SY86Mw7ZXjs0OHANicmrpV/8xXW1NHKUtZ79oArnsFyStfoVxDfKoRDwDt6ptZ7NtRQyGMa/l /pt+FMu/zJUK1GTdoMVE6K/BAWzLEzrQq9BDc4AbF0qc0VjDkBI87jbBkfGgHEPSgvB/YxGNGXEjO/ z1xAzQQ7V1ZUMz89fErbebo8akTjlz X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When a folio is pinned, there is no point in trying to write it during memory cleaning writeback. We cannot reclaim the folio until it is unpinned anyway and we cannot even be sure the folio is really clean. On top of that writeback of such folio may be problematic as the data can change while the writeback is running thus causing checksum or DIF/DIX failures. So just don't bother doing memory cleaning writeback for pinned folios. Signed-off-by: Jan Kara --- fs/9p/vfs_addr.c | 2 +- fs/afs/file.c | 2 +- fs/afs/write.c | 6 +++--- fs/btrfs/extent_io.c | 14 +++++++------- fs/btrfs/free-space-cache.c | 2 +- fs/btrfs/inode.c | 2 +- fs/btrfs/subpage.c | 2 +- fs/ceph/addr.c | 6 +++--- fs/cifs/file.c | 6 +++--- fs/ext4/inode.c | 4 ++-- fs/f2fs/checkpoint.c | 4 ++-- fs/f2fs/compress.c | 2 +- fs/f2fs/data.c | 2 +- fs/f2fs/dir.c | 2 +- fs/f2fs/gc.c | 4 ++-- fs/f2fs/inline.c | 2 +- fs/f2fs/node.c | 10 +++++----- fs/fuse/file.c | 2 +- fs/gfs2/aops.c | 2 +- fs/nfs/write.c | 2 +- fs/nilfs2/page.c | 2 +- fs/nilfs2/segment.c | 8 ++++---- fs/orangefs/inode.c | 2 +- fs/ubifs/file.c | 2 +- include/linux/pagemap.h | 5 +++-- mm/folio-compat.c | 4 ++-- mm/migrate.c | 2 +- mm/page-writeback.c | 24 ++++++++++++++++++++---- mm/vmscan.c | 2 +- 29 files changed, 73 insertions(+), 56 deletions(-) diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 97599edbc300..a14ff3c02eb1 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -221,7 +221,7 @@ static int v9fs_launder_folio(struct folio *folio) { int retval; - if (folio_clear_dirty_for_io(folio)) { + if (folio_clear_dirty_for_io(NULL, folio)) { retval = v9fs_vfs_write_folio_locked(folio); if (retval) return retval; diff --git a/fs/afs/file.c b/fs/afs/file.c index 68d6d5dc608d..8a81ac9c12fa 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -453,7 +453,7 @@ static void afs_invalidate_dirty(struct folio *folio, size_t offset, undirty: trace_afs_folio_dirty(vnode, tracepoint_string("undirty"), folio); - folio_clear_dirty_for_io(folio); + folio_clear_dirty_for_io(NULL, folio); full_invalidate: trace_afs_folio_dirty(vnode, tracepoint_string("inval"), folio); folio_detach_private(folio); diff --git a/fs/afs/write.c b/fs/afs/write.c index 19df10d63323..9a5e6d59040c 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -555,7 +555,7 @@ static void afs_extend_writeback(struct address_space *mapping, folio = page_folio(pvec.pages[i]); trace_afs_folio_dirty(vnode, tracepoint_string("store+"), folio); - if (!folio_clear_dirty_for_io(folio)) + if (!folio_clear_dirty_for_io(NULL, folio)) BUG(); if (folio_start_writeback(folio)) BUG(); @@ -769,7 +769,7 @@ static int afs_writepages_region(struct address_space *mapping, continue; } - if (!folio_clear_dirty_for_io(folio)) + if (!folio_clear_dirty_for_io(NULL, folio)) BUG(); ret = afs_write_back_from_locked_folio(mapping, wbc, folio, start, end); folio_put(folio); @@ -1000,7 +1000,7 @@ int afs_launder_folio(struct folio *folio) _enter("{%lx}", folio->index); priv = (unsigned long)folio_get_private(folio); - if (folio_clear_dirty_for_io(folio)) { + if (folio_clear_dirty_for_io(NULL, folio)) { f = 0; t = folio_size(folio); if (folio_test_private(folio)) { diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9bd32daa9b9a..2026f567cbdd 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -215,7 +215,7 @@ void extent_range_clear_dirty_for_io(struct inode *inode, u64 start, u64 end) while (index <= end_index) { page = find_get_page(inode->i_mapping, index); BUG_ON(!page); /* Pages should be in the extent_io_tree */ - clear_page_dirty_for_io(page); + clear_page_dirty_for_io(NULL, page); put_page(page); index++; } @@ -2590,7 +2590,7 @@ static int write_one_subpage_eb(struct extent_buffer *eb, no_dirty_ebs = btrfs_subpage_clear_and_test_dirty(fs_info, page, eb->start, eb->len); if (no_dirty_ebs) - clear_page_dirty_for_io(page); + clear_page_dirty_for_io(NULL, page); bio_ctrl->end_io_func = end_bio_subpage_eb_writepage; @@ -2633,7 +2633,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, for (i = 0; i < num_pages; i++) { struct page *p = eb->pages[i]; - clear_page_dirty_for_io(p); + clear_page_dirty_for_io(NULL, p); set_page_writeback(p); ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc, bio_ctrl, disk_bytenr, p, @@ -2655,7 +2655,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, if (unlikely(ret)) { for (; i < num_pages; i++) { struct page *p = eb->pages[i]; - clear_page_dirty_for_io(p); + clear_page_dirty_for_io(NULL, p); unlock_page(p); } } @@ -3083,7 +3083,7 @@ static int extent_write_cache_pages(struct address_space *mapping, } if (PageWriteback(page) || - !clear_page_dirty_for_io(page)) { + !clear_page_dirty_for_io(wbc, page)) { unlock_page(page); continue; } @@ -3174,7 +3174,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end) */ ASSERT(PageLocked(page)); ASSERT(PageDirty(page)); - clear_page_dirty_for_io(page); + clear_page_dirty_for_io(NULL, page); ret = __extent_writepage(page, &wbc_writepages, &bio_ctrl); ASSERT(ret <= 0); if (ret < 0) { @@ -4698,7 +4698,7 @@ static void btree_clear_page_dirty(struct page *page) { ASSERT(PageDirty(page)); ASSERT(PageLocked(page)); - clear_page_dirty_for_io(page); + clear_page_dirty_for_io(NULL, page); xa_lock_irq(&page->mapping->i_pages); if (!PageDirty(page)) __xa_clear_mark(&page->mapping->i_pages, diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 0d250d052487..02ec9987fc17 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -507,7 +507,7 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, bool uptodate) } for (i = 0; i < io_ctl->num_pages; i++) - clear_page_dirty_for_io(io_ctl->pages[i]); + clear_page_dirty_for_io(NULL, io_ctl->pages[i]); return 0; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 98a800b8bd43..26820c697c5b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3002,7 +3002,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work) */ mapping_set_error(page->mapping, ret); end_extent_writepage(page, ret, page_start, page_end); - clear_page_dirty_for_io(page); + clear_page_dirty_for_io(NULL, page); SetPageError(page); } btrfs_page_clear_checked(inode->root->fs_info, page, page_start, PAGE_SIZE); diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index dd46b978ac2c..f85b5a2ccdab 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -515,7 +515,7 @@ void btrfs_subpage_clear_dirty(const struct btrfs_fs_info *fs_info, last = btrfs_subpage_clear_and_test_dirty(fs_info, page, start, len); if (last) - clear_page_dirty_for_io(page); + clear_page_dirty_for_io(NULL, page); } void btrfs_subpage_set_writeback(const struct btrfs_fs_info *fs_info, diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index cac4083e387a..ff940c9cd1cf 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -926,7 +926,7 @@ static int ceph_writepages_start(struct address_space *mapping, folio->index, ceph_wbc.i_size); if ((ceph_wbc.size_stable || folio_pos(folio) >= i_size_read(inode)) && - folio_clear_dirty_for_io(folio)) + folio_clear_dirty_for_io(wbc, folio)) folio_invalidate(folio, 0, folio_size(folio)); folio_unlock(folio); @@ -948,7 +948,7 @@ static int ceph_writepages_start(struct address_space *mapping, wait_on_page_fscache(page); } - if (!clear_page_dirty_for_io(page)) { + if (!clear_page_dirty_for_io(wbc, page)) { dout("%p !clear_page_dirty_for_io\n", page); unlock_page(page); continue; @@ -1282,7 +1282,7 @@ ceph_find_incompatible(struct page *page) /* yay, writeable, do it now (without dropping page lock) */ dout(" page %p snapc %p not current, but oldest\n", page, snapc); - if (clear_page_dirty_for_io(page)) { + if (clear_page_dirty_for_io(NULL, page)) { int r = writepage_nounlock(page, NULL); if (r < 0) return ERR_PTR(r); diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 22dfc1f8b4f1..93e36829896e 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -2342,7 +2342,7 @@ cifs_writev_requeue(struct cifs_writedata *wdata) for (j = 0; j < nr_pages; j++) { wdata2->pages[j] = wdata->pages[i + j]; lock_page(wdata2->pages[j]); - clear_page_dirty_for_io(wdata2->pages[j]); + clear_page_dirty_for_io(NULL, wdata2->pages[j]); } wdata2->sync_mode = wdata->sync_mode; @@ -2582,7 +2582,7 @@ wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages, wait_on_page_writeback(page); if (PageWriteback(page) || - !clear_page_dirty_for_io(page)) { + !clear_page_dirty_for_io(wbc, page)) { unlock_page(page); break; } @@ -5076,7 +5076,7 @@ static int cifs_launder_folio(struct folio *folio) cifs_dbg(FYI, "Launder page: %lu\n", folio->index); - if (folio_clear_dirty_for_io(folio)) + if (folio_clear_dirty_for_io(&wbc, folio)) rc = cifs_writepage_locked(&folio->page, &wbc); folio_wait_fscache(folio); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 46078651ce32..7082b6ba8e12 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1616,7 +1616,7 @@ static void mpage_release_unused_pages(struct mpage_da_data *mpd, BUG_ON(folio_test_writeback(folio)); if (invalidate) { if (folio_mapped(folio)) - folio_clear_dirty_for_io(folio); + folio_clear_dirty_for_io(NULL, folio); block_invalidate_folio(folio, 0, folio_size(folio)); folio_clear_uptodate(folio); @@ -2106,7 +2106,7 @@ static int mpage_submit_page(struct mpage_da_data *mpd, struct page *page) int err; BUG_ON(page->index != mpd->first_page); - clear_page_dirty_for_io(page); + clear_page_dirty_for_io(NULL, page); /* * We have to be very careful here! Nothing protects writeback path * against i_size changes and the page can be writeably mapped into diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 56f7d0d6a8b2..37f951b11153 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -435,7 +435,7 @@ long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, f2fs_wait_on_page_writeback(page, META, true, true); - if (!clear_page_dirty_for_io(page)) + if (!clear_page_dirty_for_io(NULL, page)) goto continue_unlock; if (__f2fs_write_meta_page(page, &wbc, io_type)) { @@ -1415,7 +1415,7 @@ static void commit_checkpoint(struct f2fs_sb_info *sbi, memcpy(page_address(page), src, PAGE_SIZE); set_page_dirty(page); - if (unlikely(!clear_page_dirty_for_io(page))) + if (unlikely(!clear_page_dirty_for_io(NULL, page))) f2fs_bug_on(sbi, 1); /* writeout cp pack 2 page */ diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index 2532f369cb10..efd09e280b2c 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -1459,7 +1459,7 @@ static int f2fs_write_raw_pages(struct compress_ctx *cc, if (!PageDirty(cc->rpages[i])) goto continue_unlock; - if (!clear_page_dirty_for_io(cc->rpages[i])) + if (!clear_page_dirty_for_io(NULL, cc->rpages[i])) goto continue_unlock; ret = f2fs_write_single_data_page(cc->rpages[i], &_submitted, diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 97e816590cd9..f1d622b64690 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -3102,7 +3102,7 @@ static int f2fs_write_cache_pages(struct address_space *mapping, goto continue_unlock; } - if (!clear_page_dirty_for_io(page)) + if (!clear_page_dirty_for_io(NULL, page)) goto continue_unlock; #ifdef CONFIG_F2FS_FS_COMPRESSION diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 8e025157f35c..73005e711b83 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -938,7 +938,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, if (bit_pos == NR_DENTRY_IN_BLOCK && !f2fs_truncate_hole(dir, page->index, page->index + 1)) { f2fs_clear_page_cache_dirty_tag(page); - clear_page_dirty_for_io(page); + clear_page_dirty_for_io(NULL, page); ClearPageUptodate(page); clear_page_private_gcing(page); diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 6e2cae3d2e71..1f647287e3eb 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1361,7 +1361,7 @@ static int move_data_block(struct inode *inode, block_t bidx, f2fs_invalidate_compress_page(fio.sbi, fio.old_blkaddr); set_page_dirty(fio.encrypted_page); - if (clear_page_dirty_for_io(fio.encrypted_page)) + if (clear_page_dirty_for_io(NULL, fio.encrypted_page)) dec_page_count(fio.sbi, F2FS_DIRTY_META); set_page_writeback(fio.encrypted_page); @@ -1446,7 +1446,7 @@ static int move_data_page(struct inode *inode, block_t bidx, int gc_type, f2fs_wait_on_page_writeback(page, DATA, true, true); set_page_dirty(page); - if (clear_page_dirty_for_io(page)) { + if (clear_page_dirty_for_io(NULL, page)) { inode_dec_dirty_pages(inode); f2fs_remove_dirty_inode(inode); } diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index 21a495234ffd..2bfade0ead67 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -170,7 +170,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page) set_page_dirty(page); /* clear dirty state */ - dirty = clear_page_dirty_for_io(page); + dirty = clear_page_dirty_for_io(NULL, page); /* write data page to try to make data consistent */ set_page_writeback(page); diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index dde4c0458704..6f5571cac2b3 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -124,7 +124,7 @@ static void clear_node_page_dirty(struct page *page) { if (PageDirty(page)) { f2fs_clear_page_cache_dirty_tag(page); - clear_page_dirty_for_io(page); + clear_page_dirty_for_io(NULL, page); dec_page_count(F2FS_P_SB(page), F2FS_DIRTY_NODES); } ClearPageUptodate(page); @@ -1501,7 +1501,7 @@ static void flush_inline_data(struct f2fs_sb_info *sbi, nid_t ino) if (!PageDirty(page)) goto page_out; - if (!clear_page_dirty_for_io(page)) + if (!clear_page_dirty_for_io(NULL, page)) goto page_out; ret = f2fs_write_inline_data(inode, page); @@ -1696,7 +1696,7 @@ int f2fs_move_node_page(struct page *node_page, int gc_type) set_page_dirty(node_page); - if (!clear_page_dirty_for_io(node_page)) { + if (!clear_page_dirty_for_io(NULL, node_page)) { err = -EAGAIN; goto out_page; } @@ -1803,7 +1803,7 @@ int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, set_page_dirty(page); } - if (!clear_page_dirty_for_io(page)) + if (!clear_page_dirty_for_io(NULL, page)) goto continue_unlock; ret = __write_node_page(page, atomic && @@ -2011,7 +2011,7 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, write_node: f2fs_wait_on_page_writeback(page, NODE, true, true); - if (!clear_page_dirty_for_io(page)) + if (!clear_page_dirty_for_io(NULL, page)) goto continue_unlock; set_fsync_mark(page, 0); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 875314ee6f59..cb9561128b4b 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2399,7 +2399,7 @@ static int fuse_write_end(struct file *file, struct address_space *mapping, static int fuse_launder_folio(struct folio *folio) { int err = 0; - if (folio_clear_dirty_for_io(folio)) { + if (folio_clear_dirty_for_io(NULL, folio)) { struct inode *inode = folio->mapping->host; /* Serialize with pending writeback for the same page */ diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index e782b4f1d104..cf784dd5fd3b 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -247,7 +247,7 @@ static int gfs2_write_jdata_pagevec(struct address_space *mapping, } BUG_ON(PageWriteback(page)); - if (!clear_page_dirty_for_io(page)) + if (!clear_page_dirty_for_io(wbc, page)) goto continue_unlock; trace_wbc_writepage(wbc, inode_to_bdi(inode)); diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 80c240e50952..c07b686c84ce 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -2089,7 +2089,7 @@ int nfs_wb_page(struct inode *inode, struct page *page) for (;;) { wait_on_page_writeback(page); - if (clear_page_dirty_for_io(page)) { + if (clear_page_dirty_for_io(&wbc, page)) { ret = nfs_writepage_locked(page, &wbc); if (ret < 0) goto out_error; diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 39b7eea2642a..9c3cc20b446e 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -456,7 +456,7 @@ int __nilfs_clear_page_dirty(struct page *page) __xa_clear_mark(&mapping->i_pages, page_index(page), PAGECACHE_TAG_DIRTY); xa_unlock_irq(&mapping->i_pages); - return clear_page_dirty_for_io(page); + return clear_page_dirty_for_io(NULL, page); } xa_unlock_irq(&mapping->i_pages); return 0; diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 76c3bd88b858..123494739030 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -1644,7 +1644,7 @@ static void nilfs_begin_page_io(struct page *page) return; lock_page(page); - clear_page_dirty_for_io(page); + clear_page_dirty_for_io(NULL, page); set_page_writeback(page); unlock_page(page); } @@ -1662,7 +1662,7 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) if (bh->b_page != bd_page) { if (bd_page) { lock_page(bd_page); - clear_page_dirty_for_io(bd_page); + clear_page_dirty_for_io(NULL, bd_page); set_page_writeback(bd_page); unlock_page(bd_page); } @@ -1676,7 +1676,7 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) if (bh == segbuf->sb_super_root) { if (bh->b_page != bd_page) { lock_page(bd_page); - clear_page_dirty_for_io(bd_page); + clear_page_dirty_for_io(NULL, bd_page); set_page_writeback(bd_page); unlock_page(bd_page); bd_page = bh->b_page; @@ -1691,7 +1691,7 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) } if (bd_page) { lock_page(bd_page); - clear_page_dirty_for_io(bd_page); + clear_page_dirty_for_io(NULL, bd_page); set_page_writeback(bd_page); unlock_page(bd_page); } diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c index 4df560894386..829de5553a77 100644 --- a/fs/orangefs/inode.c +++ b/fs/orangefs/inode.c @@ -501,7 +501,7 @@ static int orangefs_launder_folio(struct folio *folio) .nr_to_write = 0, }; folio_wait_writeback(folio); - if (folio_clear_dirty_for_io(folio)) { + if (folio_clear_dirty_for_io(&wbc, folio)) { r = orangefs_writepage_locked(&folio->page, &wbc); folio_end_writeback(folio); } diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c index f2353dd676ef..41d764fd5511 100644 --- a/fs/ubifs/file.c +++ b/fs/ubifs/file.c @@ -1159,7 +1159,7 @@ static int do_truncation(struct ubifs_info *c, struct inode *inode, */ ubifs_assert(c, PagePrivate(page)); - clear_page_dirty_for_io(page); + clear_page_dirty_for_io(NULL, page); if (UBIFS_BLOCKS_PER_PAGE_SHIFT) offset = new_size & (PAGE_SIZE - 1); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 29e1f9e76eb6..81c42a95cf8d 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1059,8 +1059,9 @@ static inline void folio_cancel_dirty(struct folio *folio) if (folio_test_dirty(folio)) __folio_cancel_dirty(folio); } -bool folio_clear_dirty_for_io(struct folio *folio); -bool clear_page_dirty_for_io(struct page *page); +bool folio_clear_dirty_for_io(struct writeback_control *wbc, + struct folio *folio); +bool clear_page_dirty_for_io(struct writeback_control *wbc, struct page *page); void folio_invalidate(struct folio *folio, size_t offset, size_t length); int __must_check folio_write_one(struct folio *folio); static inline int __must_check write_one_page(struct page *page) diff --git a/mm/folio-compat.c b/mm/folio-compat.c index 69ed25790c68..748f82def674 100644 --- a/mm/folio-compat.c +++ b/mm/folio-compat.c @@ -63,9 +63,9 @@ int __set_page_dirty_nobuffers(struct page *page) } EXPORT_SYMBOL(__set_page_dirty_nobuffers); -bool clear_page_dirty_for_io(struct page *page) +bool clear_page_dirty_for_io(struct writeback_control *wbc, struct page *page) { - return folio_clear_dirty_for_io(page_folio(page)); + return folio_clear_dirty_for_io(wbc, page_folio(page)); } EXPORT_SYMBOL(clear_page_dirty_for_io); diff --git a/mm/migrate.c b/mm/migrate.c index a4d3fc65085f..0bda652153b9 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -870,7 +870,7 @@ static int writeout(struct address_space *mapping, struct folio *folio) /* No write method for the address space */ return -EINVAL; - if (!folio_clear_dirty_for_io(folio)) + if (!folio_clear_dirty_for_io(&wbc, folio)) /* Someone else already triggered a write */ return -EAGAIN; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index ad608ef2a243..2d70070e533c 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2465,7 +2465,7 @@ int write_cache_pages(struct address_space *mapping, } BUG_ON(PageWriteback(page)); - if (!clear_page_dirty_for_io(page)) + if (!clear_page_dirty_for_io(wbc, page)) goto continue_unlock; trace_wbc_writepage(wbc, inode_to_bdi(mapping->host)); @@ -2628,7 +2628,7 @@ int folio_write_one(struct folio *folio) folio_wait_writeback(folio); - if (folio_clear_dirty_for_io(folio)) { + if (folio_clear_dirty_for_io(&wbc, folio)) { folio_get(folio); ret = mapping->a_ops->writepage(&folio->page, &wbc); if (ret == 0) @@ -2924,7 +2924,7 @@ EXPORT_SYMBOL(__folio_cancel_dirty); /* * Clear a folio's dirty flag, while caring for dirty memory accounting. - * Returns true if the folio was previously dirty. + * Returns true if the folio was previously dirty and should be written back. * * This is for preparing to put the folio under writeout. We leave * the folio tagged as dirty in the xarray so that a concurrent @@ -2935,8 +2935,14 @@ EXPORT_SYMBOL(__folio_cancel_dirty); * * This incoherency between the folio's dirty flag and xarray tag is * unfortunate, but it only exists while the folio is locked. + * + * If the folio is pinned, its writeback is problematic so we just don't bother + * for memory cleaning writeback - this is why writeback control is passed in. + * If it is NULL, we assume pinned pages are not expected (e.g. this can be + * a metadata page) and warn if the page is actually pinned. */ -bool folio_clear_dirty_for_io(struct folio *folio) +bool folio_clear_dirty_for_io(struct writeback_control *wbc, + struct folio *folio) { struct address_space *mapping = folio_mapping(folio); bool ret = false; @@ -2975,6 +2981,16 @@ bool folio_clear_dirty_for_io(struct folio *folio) */ if (folio_mkclean(folio)) folio_mark_dirty(folio); + /* + * For pinned folios we have no chance to reclaim them anyway + * and we cannot be sure folio is ever clean. So memory + * cleaning writeback is pointless. Just skip it. + */ + if (wbc && wbc->sync_mode == WB_SYNC_NONE && + folio_maybe_dma_pinned(folio)) + return false; + /* Catch callers not expecting pinned pages */ + WARN_ON_ONCE(!wbc && folio_maybe_dma_pinned(folio)); /* * We carefully synchronise fault handlers against * installing a dirty pte and marking the folio dirty diff --git a/mm/vmscan.c b/mm/vmscan.c index ab3911a8b116..71a226b47ac6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1283,7 +1283,7 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping, if (mapping->a_ops->writepage == NULL) return PAGE_ACTIVATE; - if (folio_clear_dirty_for_io(folio)) { + if (folio_clear_dirty_for_io(NULL, folio)) { int res; struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE, From patchwork Thu Feb 9 12:31:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 13134494 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6051C64EC7 for ; Thu, 9 Feb 2023 12:32:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229582AbjBIMc2 (ORCPT ); Thu, 9 Feb 2023 07:32:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230162AbjBIMcP (ORCPT ); Thu, 9 Feb 2023 07:32:15 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64A55458A8; Thu, 9 Feb 2023 04:32:10 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C14BD5CD3F; Thu, 9 Feb 2023 12:32:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1675945928; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y4YzEDZhbZdKkFvOQCglyrSCzoeelXH4IjxGuMgcIuI=; b=pjxawkS2M5eKtz+GvnFcwC3DumjXALkEbDbPF91RQx02g3wjQBIzQi4mCISPDoSsBoL4Xi wL0ngBcGrP8a0GtG11zkPsaYhCXxBGa6scHo8BbZ69maWRf4CZMbP0FGw84fD5+jWJhiWE 2h36ZO528VX7rfBPNOFd03t2UOb/ICA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1675945928; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y4YzEDZhbZdKkFvOQCglyrSCzoeelXH4IjxGuMgcIuI=; b=rf210Vg2rgxIgw/9j5VnOvMLK0w+h57Pg9o/62a/gCRrE2cwDHaioTMHTfFKyhK4yfXYFe VQ9tKoOkgNBydzDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B492B139F2; Thu, 9 Feb 2023 12:32:08 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 09CUK8jn5GO9WQAAMHmgww (envelope-from ); Thu, 09 Feb 2023 12:32:08 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 9287CA06E5; Thu, 9 Feb 2023 13:32:06 +0100 (CET) From: Jan Kara To: Cc: , , John Hubbard , David Howells , David Hildenbrand , Jan Kara Subject: [PATCH 4/5] block: Add support for bouncing pinned pages Date: Thu, 9 Feb 2023 13:31:56 +0100 Message-Id: <20230209123206.3548-4-jack@suse.cz> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230209121046.25360-1-jack@suse.cz> References: <20230209121046.25360-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4009; i=jack@suse.cz; h=from:subject; bh=ySGZI321J27eApFzVattmpJixQBZTKd7s8yJo1X4JsE=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBj5Oe8plEW0pwe+Dk1xJcazzMB4gHRAhaEHGaQlPvh 56PwkpSJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCY+TnvAAKCRCcnaoHP2RA2UHTB/ 0WkGI1SDnTpE9KpeTWCk+9R/550O0Rz5rWMUoBpf1PMa95WCDtrUIkkxJvLNbDnh2x+VWkKkfbjVkG AkM0oPtb2dPBP9NMrcVNUySfE1XzAXtQiRbNFXZ8rg8D6fPFr7YEmIBiXeF4MRrlCoyGiVYe0+zpPP 39+n/mh9zqM5EqZenvv3yVz5DiiEiNvT9BbLA8PXvL/vBHJRPc0sIUG8MvGG3IN0jNU6rJLnekH4pD v7yq4ydH6d5+WJil/eCppbVtvpB4OCJfo6hHlsCRHmRAiZ5GX4avezcyYpdrvDozSpZYvCPbBOiwwO g1Y+iYUj3t99FtvISsIjJWbNAJWYJv X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When there is direct IO (or other DMA write) running into a page, it is not generally safe to submit this page for another IO (such as writeback) because this can cause checksum failures or similar issues. However sometimes we cannot avoid writing contents of these pages as pages can be pinned for extensive amount of time (e.g. for RDMA). For these cases we need to just bounce the pages if we really need to write them out. Add support for this type of bouncing into the block layer infrastructure. Signed-off-by: Jan Kara --- block/blk.h | 10 +++++++++- block/bounce.c | 9 +++++++-- include/linux/blk_types.h | 1 + mm/Kconfig | 8 ++++---- 4 files changed, 21 insertions(+), 7 deletions(-) diff --git a/block/blk.h b/block/blk.h index 4c3b3325219a..def7ab8379bc 100644 --- a/block/blk.h +++ b/block/blk.h @@ -384,10 +384,18 @@ static inline bool blk_queue_may_bounce(struct request_queue *q) max_low_pfn >= max_pfn; } +static inline bool bio_need_pin_bounce(struct bio *bio, + struct request_queue *q) +{ + return IS_ENABLED(CONFIG_BOUNCE) && + bio->bi_flags & (1 << BIO_NEED_PIN_BOUNCE); +} + static inline struct bio *blk_queue_bounce(struct bio *bio, struct request_queue *q) { - if (unlikely(blk_queue_may_bounce(q) && bio_has_data(bio))) + if (unlikely((blk_queue_may_bounce(q) || bio_need_pin_bounce(bio, q)) && + bio_has_data(bio))) return __blk_queue_bounce(bio, q); return bio; } diff --git a/block/bounce.c b/block/bounce.c index 7cfcb242f9a1..ebda95953d58 100644 --- a/block/bounce.c +++ b/block/bounce.c @@ -207,12 +207,16 @@ struct bio *__blk_queue_bounce(struct bio *bio_orig, struct request_queue *q) struct bvec_iter iter; unsigned i = 0, bytes = 0; bool bounce = false; + bool pinned_bounce = bio_orig->bi_flags & (1 << BIO_NEED_PIN_BOUNCE); + bool highmem_bounce = blk_queue_may_bounce(q); int sectors; bio_for_each_segment(from, bio_orig, iter) { if (i++ < BIO_MAX_VECS) bytes += from.bv_len; - if (PageHighMem(from.bv_page)) + if (highmem_bounce && PageHighMem(from.bv_page)) + bounce = true; + if (pinned_bounce && page_maybe_dma_pinned(from.bv_page)) bounce = true; } if (!bounce) @@ -241,7 +245,8 @@ struct bio *__blk_queue_bounce(struct bio *bio_orig, struct request_queue *q) for (i = 0, to = bio->bi_io_vec; i < bio->bi_vcnt; to++, i++) { struct page *bounce_page; - if (!PageHighMem(to->bv_page)) + if (!((highmem_bounce && PageHighMem(to->bv_page)) || + (pinned_bounce && page_maybe_dma_pinned(to->bv_page)))) continue; bounce_page = mempool_alloc(&page_pool, GFP_NOIO); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 99be590f952f..3aa1dc5d8dc6 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -321,6 +321,7 @@ enum { BIO_NO_PAGE_REF, /* don't put release vec pages */ BIO_CLONED, /* doesn't own data */ BIO_BOUNCED, /* bio is a bounce bio */ + BIO_NEED_PIN_BOUNCE, /* bio needs to bounce pinned pages */ BIO_QUIET, /* Make BIO Quiet */ BIO_CHAIN, /* chained bio, ->bi_remaining in effect */ BIO_REFFED, /* bio has elevated ->bi_cnt */ diff --git a/mm/Kconfig b/mm/Kconfig index ff7b209dec05..eba075e959e8 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -659,11 +659,11 @@ config PHYS_ADDR_T_64BIT config BOUNCE bool "Enable bounce buffers" default y - depends on BLOCK && MMU && HIGHMEM + depends on BLOCK && MMU help - Enable bounce buffers for devices that cannot access the full range of - memory available to the CPU. Enabled by default when HIGHMEM is - selected, but you may say n to override this. + Enable bounce buffers. This is used for devices that cannot access + the full range of memory available to the CPU or when DMA can be + modifying pages while they are submitted for writeback. config MMU_NOTIFIER bool From patchwork Thu Feb 9 12:31:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 13134493 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAB61C636D7 for ; Thu, 9 Feb 2023 12:32:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229551AbjBIMc1 (ORCPT ); Thu, 9 Feb 2023 07:32:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230107AbjBIMcM (ORCPT ); Thu, 9 Feb 2023 07:32:12 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67F4349559; Thu, 9 Feb 2023 04:32:10 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 1DB3937478; Thu, 9 Feb 2023 12:32:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1675945929; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qvuorklveraurLcK6QZ4mrS02HsJE4d+WeMXOh2Sd9M=; b=CqmvW0jJ3DIGd+grXAIql03BHQqGkK991C6Wi/E2LQDYyC81PqbbRNsLN58sTPW/f9eeNR 46Q2L1SdgD4NsmyatNgmANAy47H6LO4bS097sh7idyKKRF8Fxtd0K+pcd9/acCE2hE5R1h JueXPVPxo906sx84zxJWyIaa/eQRZ0E= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1675945929; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qvuorklveraurLcK6QZ4mrS02HsJE4d+WeMXOh2Sd9M=; b=tq6vAYDBQAPD+mTUztBhpxRGG5W7kgIlBecfOjKwc5V3IHSaF7kmh4eqHKEyrF+eRPINmE 5ib8Hi0bQ3OvJgAQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 11748138E4; Thu, 9 Feb 2023 12:32:09 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 7Jg8BMnn5GPGWQAAMHmgww (envelope-from ); Thu, 09 Feb 2023 12:32:09 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 982C7A06F2; Thu, 9 Feb 2023 13:32:06 +0100 (CET) From: Jan Kara To: Cc: , , John Hubbard , David Howells , David Hildenbrand , Jan Kara Subject: [PATCH 5/5] iomap: Bounce pinned pages during writeback Date: Thu, 9 Feb 2023 13:31:57 +0100 Message-Id: <20230209123206.3548-5-jack@suse.cz> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230209121046.25360-1-jack@suse.cz> References: <20230209121046.25360-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1103; i=jack@suse.cz; h=from:subject; bh=jvs7yXeGV+Ti+9CbBYnSlNBqaLxxaizxHND1BJeW90M=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBj5Oe8w55rhlFdINcQXnpIh1RKW3XbWlfIU3Y2TDA8 67+oTNmJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCY+TnvAAKCRCcnaoHP2RA2fhCB/ 4/2JpDK+0V232sGOIsqPKWw7XV7L4ti40Cg3tBEmAVQxehU+I8NK4CzzdA8EZTSvTLVZqxg5+/M6s3 bmQbCDaG0JJs98lmbFy1SWxxvOXgww4EXeQpC6BQVC6JuPMfUp8ZF5WKOy6NLsB07ZacUYwIXp3+iH vAYxabUp7I4r/qYnPszdwAwfSxdRNLzAMYlIdHglipFfjmmyjszncki6Qtux7gBGcygYWRXcHByucy 7NvP6kGl4qovVvUreW7xU027/wR8JU4aOpC5Eg+x3McBfThsCPvBFXez0gjVOSfunRnsQ9YoCL3hPj Ucbv/ceayUl3fYswgCPMwHjX26Fu+/ X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When there is direct IO (or other DMA write) running into a page, it is not generally safe to submit this page for writeback because this can cause DIF/DIX failures or similar issues. Ask block layer to bounce the page in this case. Signed-off-by: Jan Kara --- fs/iomap/buffered-io.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 356193e44cf0..e6469e7715cc 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1563,6 +1563,14 @@ iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio, bio_add_folio(wpc->ioend->io_bio, folio, len, poff); } + /* + * Folio may be modified by the owner of the pin and we require stable + * page contents during writeback? Ask block layer to bounce the bio. + */ + if (inode->i_sb->s_iflags & SB_I_STABLE_WRITES && + folio_maybe_dma_pinned(folio)) + wpc->ioend->io_bio->bi_flags |= 1 << BIO_NEED_PIN_BOUNCE; + if (iop) atomic_add(len, &iop->write_bytes_pending); wpc->ioend->io_size += len;