From patchwork Fri Sep 17 02:56:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12500845 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A051C433EF for ; Fri, 17 Sep 2021 02:59:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 75B9E611CA for ; Fri, 17 Sep 2021 02:59:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243049AbhIQDAx (ORCPT ); Thu, 16 Sep 2021 23:00:53 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:41390 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243248AbhIQDAw (ORCPT ); Thu, 16 Sep 2021 23:00:52 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id ABA7D2007B; Fri, 17 Sep 2021 02:59:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1631847569; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PhO+AMBCn5CF5KJqwHi2NRgoifLeDG6tBD6IhB9YLqI=; b=vThorR8aWQ88Y2GmJEKKlzu9235Z/ZeCUkMr1tu37GhRXPpZfoOnsprc1JRFJTuXWRqbYo OrwNdG+winpkkSlgaBUSLsVjiQSK1WiUbcsFBMP1c4iGoCvNYpDkYEriFSvFrODR8idE2j Tlusg86Y9/ATaenPc/okC73RyP3RPU0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1631847569; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PhO+AMBCn5CF5KJqwHi2NRgoifLeDG6tBD6IhB9YLqI=; b=PdD7iVhth5i0uczJ3IGRe4R+iuVLNQA6JKSOCFB6xykfVbdQIAbKb0z9L556d52BZHWlwR comcujPIZxeGDSBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 4A36E13D0B; Fri, 17 Sep 2021 02:59:25 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id v4HZAo0ERGFmMwAAMHmgww (envelope-from ); Fri, 17 Sep 2021 02:59:25 +0000 Subject: [PATCH 1/6] MM: Support __GFP_NOFAIL in alloc_pages_bulk_*() and improve doco From: NeilBrown To: Andrew Morton , Theodore Ts'o , Andreas Dilger , "Darrick J. Wong" , Matthew Wilcox , Mel Gorman , Michal Hocko , ". Dave Chinner" , Jonathan Corbet Cc: linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Date: Fri, 17 Sep 2021 12:56:57 +1000 Message-ID: <163184741776.29351.3565418361661850328.stgit@noble.brown> In-Reply-To: <163184698512.29351.4735492251524335974.stgit@noble.brown> References: <163184698512.29351.4735492251524335974.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org When alloc_pages_bulk_array() is called on an array that is partially allocated, the level of effort to get a single page is less than when the array was completely unallocated. This behaviour is inconsistent, but now fixed. One effect if this is that __GFP_NOFAIL will not ensure at least one page is allocated. Also clarify the expected success rate. __alloc_pages_bulk() will allocated one page according to @gfp, and may allocate more if that can be done cheaply. It is assumed that the caller values cheap allocation where possible and may decide to use what it has got, or to call again to get more. Acked-by: Mel Gorman Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator") Signed-off-by: NeilBrown --- mm/page_alloc.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b37435c274cf..aa51016e49c5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5191,6 +5191,11 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, * is the maximum number of pages that will be stored in the array. * * Returns the number of pages on the list or array. + * + * At least one page will be allocated if that is possible while + * remaining consistent with @gfp. Extra pages up to the requested + * total will be allocated opportunistically when doing so is + * significantly cheaper than having the caller repeat the request. */ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, nodemask_t *nodemask, int nr_pages, @@ -5292,7 +5297,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, pcp, pcp_list); if (unlikely(!page)) { /* Try and get at least one page */ - if (!nr_populated) + if (!nr_account) goto failed_irq; break; } From patchwork Fri Sep 17 02:56:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12500847 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5772BC433F5 for ; Fri, 17 Sep 2021 02:59:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 41B7561019 for ; Fri, 17 Sep 2021 02:59:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243392AbhIQDBC (ORCPT ); Thu, 16 Sep 2021 23:01:02 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:56522 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243247AbhIQDBA (ORCPT ); Thu, 16 Sep 2021 23:01:00 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 6D480223BD; Fri, 17 Sep 2021 02:59:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1631847577; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SCmYAExKah/x4HefNwJ6SI3ygmGtoGw6YCC4JX6hn1A=; b=ZD08jNr3STpxIH2zhAj9vQ08PCu/q3GNJLyXZgVVTvIjd9eXcLIYJhnhXbjSrigDBUIuC2 WQBHbIGiQMBjRXzZr9ZH/nF1atOn2t78ukiAIiA1h/5IZocoG6GGgoBBxuLpGXgSci4yfF OkJ4ZwgFwbPkHnuIb/g+WH804uf8Z8c= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1631847577; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SCmYAExKah/x4HefNwJ6SI3ygmGtoGw6YCC4JX6hn1A=; b=kBS9mRBLg9S4WsHwdwYdezpuGLpU07TMQfxxihZAV7Xevvz28AvWtdhEVRCrBuOlLwSmy+ 6vS8wGL3Y8Oq9MDA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 029DC13D0B; Fri, 17 Sep 2021 02:59:32 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id WSX/LJQERGF0MwAAMHmgww (envelope-from ); Fri, 17 Sep 2021 02:59:32 +0000 Subject: [PATCH 2/6] MM: improve documentation for __GFP_NOFAIL From: NeilBrown To: Andrew Morton , Theodore Ts'o , Andreas Dilger , "Darrick J. Wong" , Matthew Wilcox , Mel Gorman , Michal Hocko , ". Dave Chinner" , Jonathan Corbet Cc: linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Date: Fri, 17 Sep 2021 12:56:57 +1000 Message-ID: <163184741778.29351.16920832234899124642.stgit@noble.brown> In-Reply-To: <163184698512.29351.4735492251524335974.stgit@noble.brown> References: <163184698512.29351.4735492251524335974.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org __GFP_NOFAIL is documented both in gfp.h and memory-allocation.rst. The details are not entirely consistent. This patch ensures both places state that: - there is a risk of deadlock with reclaim/writeback/oom-kill - it should only be used when there is no real alternative - it is preferable to an endless loop - it is strongly discourages for costly-order allocations. Signed-off-by: NeilBrown Acked-by: Vlastimil Babka --- Documentation/core-api/memory-allocation.rst | 25 ++++++++++++++++++++++++- include/linux/gfp.h | 6 +++++- 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst index 5954ddf6ee13..8ea077465446 100644 --- a/Documentation/core-api/memory-allocation.rst +++ b/Documentation/core-api/memory-allocation.rst @@ -126,7 +126,30 @@ or another request. * ``GFP_KERNEL | __GFP_NOFAIL`` - overrides the default allocator behavior and all allocation requests will loop endlessly until they succeed. - This might be really dangerous especially for larger orders. + Any attempt to use ``__GFP_NOFAIL`` for allocations larger than + order-1 (2 pages) will trigger a warning. + + Use of ``__GFP_NOFAIL`` can cause deadlocks so it should only be used + when there is no alternative, and then should be used with caution. + Deadlocks can happen if the calling process holds any resources + (e.g. locks) which might be needed for memory reclaim or write-back, + or which might prevent a process killed by the OOM killer from + successfully exiting. Where possible, locks should be released + before using ``__GFP_NOFAIL``. + + While this flag is best avoided, it is still preferable to endless + loops around the allocator. Endless loops may still be used when + there is a need to test for the process being killed + (fatal_signal_pending(current)). + + * ``GFP_NOFS | __GFP_NOFAIL`` - Loop endlessly instead of failing + when performing allocations in file system code. The same guidance + as for ``GFP_KERNEL | __GFP_NOFAIL`` applies with extra emphasis on + the possibility of deadlocks. ``GFP_NOFS`` often implies that + filesystem locks are held which might lead to blocking reclaim. + Preemptively flushing or reclaiming memory associated with such + locks might be appropriate before requesting a ``__GFP_NOFAIL`` + allocation. Selecting memory allocator ========================== diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 55b2ec1f965a..1d2a89e20b8b 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -209,7 +209,11 @@ struct vm_area_struct; * used only when there is no reasonable failure policy) but it is * definitely preferable to use the flag rather than opencode endless * loop around allocator. - * Using this flag for costly allocations is _highly_ discouraged. + * Use of this flag may lead to deadlocks if locks are held which would + * be needed for memory reclaim, write-back, or the timely exit of a + * process killed by the OOM-killer. Dropping any locks not absolutely + * needed is advisable before requesting a %__GFP_NOFAIL allocate. + * Using this flag for costly allocations (order>1) is _highly_ discouraged. */ #define __GFP_IO ((__force gfp_t)___GFP_IO) #define __GFP_FS ((__force gfp_t)___GFP_FS) From patchwork Fri Sep 17 02:56:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12500849 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1809C433F5 for ; Fri, 17 Sep 2021 02:59:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9679A61019 for ; Fri, 17 Sep 2021 02:59:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243463AbhIQDBK (ORCPT ); Thu, 16 Sep 2021 23:01:10 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:56556 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243223AbhIQDBI (ORCPT ); Thu, 16 Sep 2021 23:01:08 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 5437C223BD; Fri, 17 Sep 2021 02:59:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1631847585; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YP+rEJjijlKtOcTSoLGm+/Sit6D6vmRSooK5azJLi1k=; b=I5qNkO1mWepgcXho2FjYeT6Uw3clWuTiQoTou5EHR/xb6MqLHCevtqn9rUKCkmqqiO7BMJ Sfd632MlgeOiX8Puw/ExC9tIebdwkbpWFQFLa4nvDLiJlIZTqnNBivMFnF81eSjkqlvovc y1fYIL6oZZWGm8XGxEf0ufxNXOKC5G8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1631847585; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YP+rEJjijlKtOcTSoLGm+/Sit6D6vmRSooK5azJLi1k=; b=p53fkY74Gn2xQdYIyWUP/MHRnO1qEvlXms86OVK6bbi7ic8oGD9rKYWbyPPUzNSqYuE43y YTZCMSoj/5W61ACw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9F24A13D0B; Fri, 17 Sep 2021 02:59:40 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id rk9kF5wERGGHMwAAMHmgww (envelope-from ); Fri, 17 Sep 2021 02:59:40 +0000 Subject: [PATCH 3/6] EXT4: Remove ENOMEM/congestion_wait() loops. From: NeilBrown To: Andrew Morton , Theodore Ts'o , Andreas Dilger , "Darrick J. Wong" , Matthew Wilcox , Mel Gorman , Michal Hocko , ". Dave Chinner" , Jonathan Corbet Cc: linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Date: Fri, 17 Sep 2021 12:56:57 +1000 Message-ID: <163184741779.29351.15039533971869971199.stgit@noble.brown> In-Reply-To: <163184698512.29351.4735492251524335974.stgit@noble.brown> References: <163184698512.29351.4735492251524335974.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Indefinite loops waiting for memory allocation are discouraged by documentation in gfp.h which says the use of __GFP_NOFAIL that it is definitely preferable to use the flag rather than opencode endless loop around allocator. Such loops that use congestion_wait() are particularly unwise as congestion_wait() is indistinguishable from schedule_timeout_uninterruptible() in practice - and should be deprecated. So this patch changes the two loops in ext4_ext_truncate() to use __GFP_NOFAIL instead of looping. As the allocation is multiple layers deeper in the call stack, this requires passing the EXT4_EX_NOFAIL flag down and handling it in various places. __ext4_journal_start_sb() is given a new 'gfp_mask' argument which is passed through to jbd2__journal_start() and always receives GFP_NOFS except when called through ext4_journal_state_with_revoke from ext4_ext_remove_space(), which now can include __GFP_NOFAIL. jbd2__journal_start() is enhanced so that the gfp_t flags passed are used for *all* allocations. Signed-off-by: NeilBrown --- fs/ext4/ext4.h | 2 +- fs/ext4/ext4_jbd2.c | 4 ++- fs/ext4/ext4_jbd2.h | 14 +++++++----- fs/ext4/extents.c | 53 +++++++++++++++++++--------------------------- fs/ext4/extents_status.c | 35 +++++++++++++++++------------- fs/ext4/extents_status.h | 2 +- fs/ext4/ialloc.c | 3 ++- fs/ext4/indirect.c | 2 +- fs/ext4/inode.c | 6 +++-- fs/ext4/ioctl.c | 4 ++- fs/ext4/super.c | 2 +- fs/jbd2/transaction.c | 8 +++---- 12 files changed, 67 insertions(+), 68 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 90ff5acaf11f..52a34f5dfda2 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3720,7 +3720,7 @@ extern int ext4_ext_map_blocks(handle_t *handle, struct inode *inode, struct ext4_map_blocks *map, int flags); extern int ext4_ext_truncate(handle_t *, struct inode *); extern int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, - ext4_lblk_t end); + ext4_lblk_t end, int nofail); extern void ext4_ext_init(struct super_block *); extern void ext4_ext_release(struct super_block *); extern long ext4_fallocate(struct file *file, int mode, loff_t offset, diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 6def7339056d..780fde556dc0 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -88,7 +88,7 @@ static int ext4_journal_check_start(struct super_block *sb) handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line, int type, int blocks, int rsv_blocks, - int revoke_creds) + int revoke_creds, gfp_t gfp_mask) { journal_t *journal; int err; @@ -103,7 +103,7 @@ handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line, if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)) return ext4_get_nojournal(); return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds, - GFP_NOFS, type, line); + gfp_mask, type, line); } int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle) diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 0e4fa644df01..da79be8ffff4 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -263,7 +263,7 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line, int type, int blocks, int rsv_blocks, - int revoke_creds); + int revoke_creds, gfp_t gfp_mask); int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle); #define EXT4_NOJOURNAL_MAX_REF_COUNT ((unsigned long) 4096) @@ -304,7 +304,8 @@ static inline int ext4_trans_default_revoke_credits(struct super_block *sb) #define ext4_journal_start_sb(sb, type, nblocks) \ __ext4_journal_start_sb((sb), __LINE__, (type), (nblocks), 0, \ - ext4_trans_default_revoke_credits(sb)) + ext4_trans_default_revoke_credits(sb), \ + GFP_NOFS) #define ext4_journal_start(inode, type, nblocks) \ __ext4_journal_start((inode), __LINE__, (type), (nblocks), 0, \ @@ -314,9 +315,9 @@ static inline int ext4_trans_default_revoke_credits(struct super_block *sb) __ext4_journal_start((inode), __LINE__, (type), (blocks), (rsv_blocks),\ ext4_trans_default_revoke_credits((inode)->i_sb)) -#define ext4_journal_start_with_revoke(inode, type, blocks, revoke_creds) \ - __ext4_journal_start((inode), __LINE__, (type), (blocks), 0, \ - (revoke_creds)) +#define ext4_journal_start_with_revoke(gfp_flags, inode, type, blocks, revoke_creds) \ + __ext4_journal_start_sb((inode)->i_sb, __LINE__, (type), (blocks),\ + 0, (revoke_creds), (gfp_flags)) static inline handle_t *__ext4_journal_start(struct inode *inode, unsigned int line, int type, @@ -324,7 +325,8 @@ static inline handle_t *__ext4_journal_start(struct inode *inode, int revoke_creds) { return __ext4_journal_start_sb(inode->i_sb, line, type, blocks, - rsv_blocks, revoke_creds); + rsv_blocks, revoke_creds, + GFP_NOFS); } #define ext4_journal_stop(handle) \ diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index c0de30f25185..c6c16aa8b618 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1488,7 +1488,7 @@ static int ext4_ext_search_left(struct inode *inode, static int ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path, ext4_lblk_t *logical, ext4_fsblk_t *phys, - struct ext4_extent *ret_ex) + struct ext4_extent *ret_ex, int nofail) { struct buffer_head *bh = NULL; struct ext4_extent_header *eh; @@ -1565,7 +1565,7 @@ static int ext4_ext_search_right(struct inode *inode, while (++depth < path->p_depth) { /* subtract from p_depth to get proper eh_depth */ bh = read_extent_tree_block(inode, block, - path->p_depth - depth, 0); + path->p_depth - depth, nofail); if (IS_ERR(bh)) return PTR_ERR(bh); eh = ext_block_hdr(bh); @@ -1574,7 +1574,7 @@ static int ext4_ext_search_right(struct inode *inode, put_bh(bh); } - bh = read_extent_tree_block(inode, block, path->p_depth - depth, 0); + bh = read_extent_tree_block(inode, block, path->p_depth - depth, nofail); if (IS_ERR(bh)) return PTR_ERR(bh); eh = ext_block_hdr(bh); @@ -2773,7 +2773,7 @@ ext4_ext_more_to_rm(struct ext4_ext_path *path) } int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, - ext4_lblk_t end) + ext4_lblk_t end, int nofail) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); int depth = ext_depth(inode); @@ -2781,6 +2781,10 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, struct partial_cluster partial; handle_t *handle; int i = 0, err = 0; + gfp_t gfp_mask = GFP_NOFS; + + if (nofail) + gfp_mask |= __GFP_NOFAIL; partial.pclu = 0; partial.lblk = 0; @@ -2789,7 +2793,8 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, ext_debug(inode, "truncate since %u to %u\n", start, end); /* probably first extent we're gonna free will be last in block */ - handle = ext4_journal_start_with_revoke(inode, EXT4_HT_TRUNCATE, + handle = ext4_journal_start_with_revoke(gfp_mask, + inode, EXT4_HT_TRUNCATE, depth + 1, ext4_free_metadata_revoke_credits(inode->i_sb, depth)); if (IS_ERR(handle)) @@ -2877,7 +2882,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, */ lblk = ex_end + 1; err = ext4_ext_search_right(inode, path, &lblk, &pblk, - NULL); + NULL, nofail); if (err < 0) goto out; if (pblk) { @@ -2899,10 +2904,6 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, } else { path = kcalloc(depth + 1, sizeof(struct ext4_ext_path), GFP_NOFS | __GFP_NOFAIL); - if (path == NULL) { - ext4_journal_stop(handle); - return -ENOMEM; - } path[0].p_maxdepth = path[0].p_depth = depth; path[0].p_hdr = ext_inode_hdr(inode); i = 0; @@ -2955,7 +2956,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, memset(path + i + 1, 0, sizeof(*path)); bh = read_extent_tree_block(inode, ext4_idx_pblock(path[i].p_idx), depth - i - 1, - EXT4_EX_NOCACHE); + EXT4_EX_NOCACHE | nofail); if (IS_ERR(bh)) { /* should we reset i_size? */ err = PTR_ERR(bh); @@ -4186,7 +4187,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode, if (err) goto out; ar.lright = map->m_lblk; - err = ext4_ext_search_right(inode, path, &ar.lright, &ar.pright, &ex2); + err = ext4_ext_search_right(inode, path, &ar.lright, &ar.pright, &ex2, 0); if (err < 0) goto out; @@ -4368,23 +4369,13 @@ int ext4_ext_truncate(handle_t *handle, struct inode *inode) last_block = (inode->i_size + sb->s_blocksize - 1) >> EXT4_BLOCK_SIZE_BITS(sb); -retry: err = ext4_es_remove_extent(inode, last_block, - EXT_MAX_BLOCKS - last_block); - if (err == -ENOMEM) { - cond_resched(); - congestion_wait(BLK_RW_ASYNC, HZ/50); - goto retry; - } + EXT_MAX_BLOCKS - last_block, + EXT4_EX_NOFAIL); if (err) return err; -retry_remove_space: - err = ext4_ext_remove_space(inode, last_block, EXT_MAX_BLOCKS - 1); - if (err == -ENOMEM) { - cond_resched(); - congestion_wait(BLK_RW_ASYNC, HZ/50); - goto retry_remove_space; - } + err = ext4_ext_remove_space(inode, last_block, EXT_MAX_BLOCKS - 1, + EXT4_EX_NOFAIL); return err; } @@ -5322,13 +5313,13 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) ext4_discard_preallocations(inode, 0); ret = ext4_es_remove_extent(inode, punch_start, - EXT_MAX_BLOCKS - punch_start); + EXT_MAX_BLOCKS - punch_start, 0); if (ret) { up_write(&EXT4_I(inode)->i_data_sem); goto out_stop; } - ret = ext4_ext_remove_space(inode, punch_start, punch_stop - 1); + ret = ext4_ext_remove_space(inode, punch_start, punch_stop - 1, 0); if (ret) { up_write(&EXT4_I(inode)->i_data_sem); goto out_stop; @@ -5510,7 +5501,7 @@ static int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len) } ret = ext4_es_remove_extent(inode, offset_lblk, - EXT_MAX_BLOCKS - offset_lblk); + EXT_MAX_BLOCKS - offset_lblk, 0); if (ret) { up_write(&EXT4_I(inode)->i_data_sem); goto out_stop; @@ -5574,10 +5565,10 @@ ext4_swap_extents(handle_t *handle, struct inode *inode1, BUG_ON(!inode_is_locked(inode1)); BUG_ON(!inode_is_locked(inode2)); - *erp = ext4_es_remove_extent(inode1, lblk1, count); + *erp = ext4_es_remove_extent(inode1, lblk1, count, 0); if (unlikely(*erp)) return 0; - *erp = ext4_es_remove_extent(inode2, lblk2, count); + *erp = ext4_es_remove_extent(inode2, lblk2, count, 0); if (unlikely(*erp)) return 0; diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c index 9a3a8996aacf..7f7711a2ea44 100644 --- a/fs/ext4/extents_status.c +++ b/fs/ext4/extents_status.c @@ -144,9 +144,10 @@ static struct kmem_cache *ext4_es_cachep; static struct kmem_cache *ext4_pending_cachep; -static int __es_insert_extent(struct inode *inode, struct extent_status *newes); +static int __es_insert_extent(struct inode *inode, struct extent_status *newes, + int nofail); static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk, - ext4_lblk_t end, int *reserved); + ext4_lblk_t end, int *reserved, int nofail); static int es_reclaim_extents(struct ext4_inode_info *ei, int *nr_to_scan); static int __es_shrink(struct ext4_sb_info *sbi, int nr_to_scan, struct ext4_inode_info *locked_ei); @@ -452,10 +453,11 @@ static void ext4_es_list_del(struct inode *inode) static struct extent_status * ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len, - ext4_fsblk_t pblk) + ext4_fsblk_t pblk, int nofail) { struct extent_status *es; - es = kmem_cache_alloc(ext4_es_cachep, GFP_ATOMIC); + es = kmem_cache_alloc(ext4_es_cachep, + GFP_ATOMIC | (nofail ? __GFP_NOFAIL : 0)); if (es == NULL) return NULL; es->es_lblk = lblk; @@ -754,7 +756,8 @@ static inline void ext4_es_insert_extent_check(struct inode *inode, } #endif -static int __es_insert_extent(struct inode *inode, struct extent_status *newes) +static int __es_insert_extent(struct inode *inode, struct extent_status *newes, + int nofail) { struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree; struct rb_node **p = &tree->root.rb_node; @@ -795,7 +798,7 @@ static int __es_insert_extent(struct inode *inode, struct extent_status *newes) } es = ext4_es_alloc_extent(inode, newes->es_lblk, newes->es_len, - newes->es_pblk); + newes->es_pblk, nofail); if (!es) return -ENOMEM; rb_link_node(&es->rb_node, parent, p); @@ -848,11 +851,11 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk, ext4_es_insert_extent_check(inode, &newes); write_lock(&EXT4_I(inode)->i_es_lock); - err = __es_remove_extent(inode, lblk, end, NULL); + err = __es_remove_extent(inode, lblk, end, NULL, 0); if (err != 0) goto error; retry: - err = __es_insert_extent(inode, &newes); + err = __es_insert_extent(inode, &newes, 0); if (err == -ENOMEM && __es_shrink(EXT4_SB(inode->i_sb), 128, EXT4_I(inode))) goto retry; @@ -902,7 +905,7 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk, es = __es_tree_search(&EXT4_I(inode)->i_es_tree.root, lblk); if (!es || es->es_lblk > end) - __es_insert_extent(inode, &newes); + __es_insert_extent(inode, &newes, 0); write_unlock(&EXT4_I(inode)->i_es_lock); } @@ -1294,6 +1297,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end, * @lblk - first block in range * @end - last block in range * @reserved - number of cluster reservations released + * @nofail - EXT4_EX_NOFAIL if __GFP_NOFAIL should be used * * If @reserved is not NULL and delayed allocation is enabled, counts * block/cluster reservations freed by removing range and if bigalloc @@ -1301,7 +1305,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end, * error code on failure. */ static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk, - ext4_lblk_t end, int *reserved) + ext4_lblk_t end, int *reserved, int nofail) { struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree; struct rb_node *node; @@ -1350,7 +1354,7 @@ static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk, orig_es.es_len - len2; ext4_es_store_pblock_status(&newes, block, ext4_es_status(&orig_es)); - err = __es_insert_extent(inode, &newes); + err = __es_insert_extent(inode, &newes, nofail); if (err) { es->es_lblk = orig_es.es_lblk; es->es_len = orig_es.es_len; @@ -1426,12 +1430,13 @@ static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk, * @inode - file containing range * @lblk - first block in range * @len - number of blocks to remove + * @nofail - EXT4_EX_NOFAIL if __GFP_NOFAIL should be used * * Reduces block/cluster reservation count and for bigalloc cancels pending * reservations as needed. Returns 0 on success, error code on failure. */ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk, - ext4_lblk_t len) + ext4_lblk_t len, int nofail) { ext4_lblk_t end; int err = 0; @@ -1456,7 +1461,7 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk, * is reclaimed. */ write_lock(&EXT4_I(inode)->i_es_lock); - err = __es_remove_extent(inode, lblk, end, &reserved); + err = __es_remove_extent(inode, lblk, end, &reserved, nofail); write_unlock(&EXT4_I(inode)->i_es_lock); ext4_es_print_tree(inode); ext4_da_release_space(inode, reserved); @@ -2003,11 +2008,11 @@ int ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk, write_lock(&EXT4_I(inode)->i_es_lock); - err = __es_remove_extent(inode, lblk, lblk, NULL); + err = __es_remove_extent(inode, lblk, lblk, NULL, 0); if (err != 0) goto error; retry: - err = __es_insert_extent(inode, &newes); + err = __es_insert_extent(inode, &newes, 0); if (err == -ENOMEM && __es_shrink(EXT4_SB(inode->i_sb), 128, EXT4_I(inode))) goto retry; diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h index 4ec30a798260..23d77094a165 100644 --- a/fs/ext4/extents_status.h +++ b/fs/ext4/extents_status.h @@ -134,7 +134,7 @@ extern void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len, ext4_fsblk_t pblk, unsigned int status); extern int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk, - ext4_lblk_t len); + ext4_lblk_t len, int nofail); extern void ext4_es_find_extent_range(struct inode *inode, int (*match_fn)(struct extent_status *es), ext4_lblk_t lblk, ext4_lblk_t end, diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index f73e5eb43eae..82049fb94314 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1079,7 +1079,8 @@ struct inode *__ext4_new_inode(struct user_namespace *mnt_userns, BUG_ON(nblocks <= 0); handle = __ext4_journal_start_sb(dir->i_sb, line_no, handle_type, nblocks, 0, - ext4_trans_default_revoke_credits(sb)); + ext4_trans_default_revoke_credits(sb), + GFP_NOFS); if (IS_ERR(handle)) { err = PTR_ERR(handle); ext4_std_error(sb, err); diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c index 89efa78ed4b2..910e87aea7be 100644 --- a/fs/ext4/indirect.c +++ b/fs/ext4/indirect.c @@ -1125,7 +1125,7 @@ void ext4_ind_truncate(handle_t *handle, struct inode *inode) return; } - ext4_es_remove_extent(inode, last_block, EXT_MAX_BLOCKS - last_block); + ext4_es_remove_extent(inode, last_block, EXT_MAX_BLOCKS - last_block, 0); /* * The orphan list entry will now protect us from any crash which diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index d18852d6029c..24246043d94b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1575,7 +1575,7 @@ static void mpage_release_unused_pages(struct mpage_da_data *mpd, ext4_lblk_t start, last; start = index << (PAGE_SHIFT - inode->i_blkbits); last = end << (PAGE_SHIFT - inode->i_blkbits); - ext4_es_remove_extent(inode, start, last - start + 1); + ext4_es_remove_extent(inode, start, last - start + 1, 0); } pagevec_init(&pvec); @@ -4109,7 +4109,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) ext4_discard_preallocations(inode, 0); ret = ext4_es_remove_extent(inode, first_block, - stop_block - first_block); + stop_block - first_block, 0); if (ret) { up_write(&EXT4_I(inode)->i_data_sem); goto out_stop; @@ -4117,7 +4117,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) ret = ext4_ext_remove_space(inode, first_block, - stop_block - 1); + stop_block - 1, 0); else ret = ext4_ind_remove_space(handle, inode, first_block, stop_block); diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 606dee9e08a3..e4de05a6b976 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -79,8 +79,8 @@ static void swap_inode_data(struct inode *inode1, struct inode *inode2) (ei1->i_flags & ~EXT4_FL_SHOULD_SWAP); ei2->i_flags = tmp | (ei2->i_flags & ~EXT4_FL_SHOULD_SWAP); swap(ei1->i_disksize, ei2->i_disksize); - ext4_es_remove_extent(inode1, 0, EXT_MAX_BLOCKS); - ext4_es_remove_extent(inode2, 0, EXT_MAX_BLOCKS); + ext4_es_remove_extent(inode1, 0, EXT_MAX_BLOCKS, 0); + ext4_es_remove_extent(inode2, 0, EXT_MAX_BLOCKS, 0); isize = i_size_read(inode1); i_size_write(inode1, i_size_read(inode2)); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 0775950ee84e..947e8376a35a 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1393,7 +1393,7 @@ void ext4_clear_inode(struct inode *inode) invalidate_inode_buffers(inode); clear_inode(inode); ext4_discard_preallocations(inode, 0); - ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS); + ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS, 0); dquot_drop(inode); if (EXT4_I(inode)->jinode) { jbd2_journal_release_jbd_inode(EXT4_JOURNAL(inode), diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 6a3caedd2285..23e0f003d43b 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -476,9 +476,9 @@ static int start_this_handle(journal_t *journal, handle_t *handle, } /* Allocate a new handle. This should probably be in a slab... */ -static handle_t *new_handle(int nblocks) +static handle_t *new_handle(int nblocks, gfp_t gfp) { - handle_t *handle = jbd2_alloc_handle(GFP_NOFS); + handle_t *handle = jbd2_alloc_handle(gfp); if (!handle) return NULL; handle->h_total_credits = nblocks; @@ -505,13 +505,13 @@ handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks, nblocks += DIV_ROUND_UP(revoke_records, journal->j_revoke_records_per_block); - handle = new_handle(nblocks); + handle = new_handle(nblocks, gfp_mask); if (!handle) return ERR_PTR(-ENOMEM); if (rsv_blocks) { handle_t *rsv_handle; - rsv_handle = new_handle(rsv_blocks); + rsv_handle = new_handle(rsv_blocks, gfp_mask); if (!rsv_handle) { jbd2_free_handle(handle); return ERR_PTR(-ENOMEM); From patchwork Fri Sep 17 02:56:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12500851 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED1F7C433EF for ; Fri, 17 Sep 2021 03:00:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D849461029 for ; Fri, 17 Sep 2021 03:00:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243485AbhIQDBU (ORCPT ); Thu, 16 Sep 2021 23:01:20 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:56580 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243481AbhIQDBR (ORCPT ); Thu, 16 Sep 2021 23:01:17 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D0018223BD; Fri, 17 Sep 2021 02:59:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1631847594; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K2Rgi41wxZueRRJfJ3e7t139IMXGOwOFm/IW/86sxa8=; b=TN9g57JhoeeQCj56ZDT1Z5eH8c3azHNFurn6Qcu8WtVKUJyL+7juZYOKl1ThupKPKuR9Zk wrwCImtWXdb8m2WpjzERGLTMfrZEvG2PAT+/nYm6aGTE1kEUhzGQxoTgg4GRVdjak/X352 u75+NChOvjnDYseuYUWwfwRfaQ3y4LU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1631847594; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K2Rgi41wxZueRRJfJ3e7t139IMXGOwOFm/IW/86sxa8=; b=wTggxCx6C/rUpvp5f5bdxeTCzPFiiQYeoHK1ofO3v0pxvISag8G6Kmr6h1TGJJjXk9dWpv DMvK0w6KJxde2nCw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 8AC2A13D0B; Fri, 17 Sep 2021 02:59:48 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id OQR+EqQERGGWMwAAMHmgww (envelope-from ); Fri, 17 Sep 2021 02:59:48 +0000 Subject: [PATCH 4/6] EXT4: remove congestion_wait from ext4_bio_write_page, and simplify From: NeilBrown To: Andrew Morton , Theodore Ts'o , Andreas Dilger , "Darrick J. Wong" , Matthew Wilcox , Mel Gorman , Michal Hocko , ". Dave Chinner" , Jonathan Corbet Cc: linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Date: Fri, 17 Sep 2021 12:56:57 +1000 Message-ID: <163184741781.29351.8660877195340279243.stgit@noble.brown> In-Reply-To: <163184698512.29351.4735492251524335974.stgit@noble.brown> References: <163184698512.29351.4735492251524335974.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org congestion_wait() is indistinguishable from schedule_timeout_uninterruptible(). It is best avoided and should be deprecated. It is not needed in ext4_bio_write_page(). There are two cases. If there are no ->io_bio yet, then it is appropriate to use __GFP_NOFAIL which does the waiting in a better place. The code already uses this flag on the second attempt. This patch changes to it always use that flag for this case. If there *are* ->io_bio (in which case the allocation was non-blocking) we submit the io and return the first case. No waiting is needed in this case. So remove the congestion_wait() call, and simplify the code so that the two cases are somewhat clearer. Remove the "if (io->io_bio)" before calling ext4_io_submit() as that test is performed internally by that function. Signed-off-by: NeilBrown --- fs/ext4/page-io.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index f038d578d8d8..3b6ece0d3ad6 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -506,7 +506,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io, * can't happen in the common case of blocksize == PAGE_SIZE. */ if (fscrypt_inode_uses_fs_layer_crypto(inode) && nr_to_submit) { - gfp_t gfp_flags = GFP_NOFS; + gfp_t gfp_flags; unsigned int enc_bytes = round_up(len, i_blocksize(inode)); /* @@ -514,21 +514,18 @@ int ext4_bio_write_page(struct ext4_io_submit *io, * a waiting mask (i.e. request guaranteed allocation) on the * first page of the bio. Otherwise it can deadlock. */ + retry_encrypt: if (io->io_bio) gfp_flags = GFP_NOWAIT | __GFP_NOWARN; - retry_encrypt: + else + gfp_flags = GFP_NOFS | __GFP_NOFAIL; bounce_page = fscrypt_encrypt_pagecache_blocks(page, enc_bytes, 0, gfp_flags); if (IS_ERR(bounce_page)) { ret = PTR_ERR(bounce_page); if (ret == -ENOMEM && (io->io_bio || wbc->sync_mode == WB_SYNC_ALL)) { - gfp_flags = GFP_NOFS; - if (io->io_bio) - ext4_io_submit(io); - else - gfp_flags |= __GFP_NOFAIL; - congestion_wait(BLK_RW_ASYNC, HZ/50); + ext4_io_submit(io); goto retry_encrypt; } From patchwork Fri Sep 17 02:56:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12500853 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC5E5C433F5 for ; Fri, 17 Sep 2021 03:00:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C6C8561019 for ; Fri, 17 Sep 2021 03:00:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243537AbhIQDBa (ORCPT ); Thu, 16 Sep 2021 23:01:30 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:56608 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243240AbhIQDB1 (ORCPT ); Thu, 16 Sep 2021 23:01:27 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9687D223BD; Fri, 17 Sep 2021 03:00:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1631847604; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+1u7JfJ60ADhg0u9Pcl8n+h1Bz25Z10lXHgfdltE1Jc=; b=09WD61SfjEhN3tce5Wk8dsHcJSb/WzRs0PxztPkg/oLbQ8tSPZsUEzjy6uqwmhzuikCPuq U/HwnCMpj/gmCuZzk7ruhT8FrXhKowH88e6xhNyS5IhdEstIJ/afToqMOTj8wFJrEPQ6Xa cVP2L7nrJ3Ww7J3EnBJIz5piqvJIZY0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1631847604; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+1u7JfJ60ADhg0u9Pcl8n+h1Bz25Z10lXHgfdltE1Jc=; b=x0CJKsYi96KJP/4iSQP5YmW+cfO6FQgofzJz/NaLv+dviaJqtmgBNoT8ZnNF2BRpgwlvYN cVAEWbD8Oc5hvoAg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3870F13D0B; Fri, 17 Sep 2021 02:59:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id s9QlOq8ERGGiMwAAMHmgww (envelope-from ); Fri, 17 Sep 2021 02:59:59 +0000 Subject: [PATCH 5/6] XFS: remove congestion_wait() loop from kmem_alloc() From: NeilBrown To: Andrew Morton , Theodore Ts'o , Andreas Dilger , "Darrick J. Wong" , Matthew Wilcox , Mel Gorman , Michal Hocko , ". Dave Chinner" , Jonathan Corbet Cc: linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Date: Fri, 17 Sep 2021 12:56:57 +1000 Message-ID: <163184741781.29351.4475236694432020436.stgit@noble.brown> In-Reply-To: <163184698512.29351.4735492251524335974.stgit@noble.brown> References: <163184698512.29351.4735492251524335974.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Documentation commment in gfp.h discourages indefinite retry loops on ENOMEM and says of __GFP_NOFAIL that it is definitely preferable to use the flag rather than opencode endless loop around allocator. So remove the loop, instead specifying __GFP_NOFAIL if KM_MAYFAIL was not given. As we no longer have the opportunity to report a warning after some failures, clear __GFP_NOWARN so that the default warning (rate-limited to 1 ever 10 seconds) will be reported instead. Signed-off-by: NeilBrown --- fs/xfs/kmem.c | 19 ++++++------------- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c index 6f49bf39183c..575a58e61391 100644 --- a/fs/xfs/kmem.c +++ b/fs/xfs/kmem.c @@ -11,21 +11,14 @@ void * kmem_alloc(size_t size, xfs_km_flags_t flags) { - int retries = 0; gfp_t lflags = kmem_flags_convert(flags); - void *ptr; trace_kmem_alloc(size, flags, _RET_IP_); - do { - ptr = kmalloc(size, lflags); - if (ptr || (flags & KM_MAYFAIL)) - return ptr; - if (!(++retries % 100)) - xfs_err(NULL, - "%s(%u) possible memory allocation deadlock size %u in %s (mode:0x%x)", - current->comm, current->pid, - (unsigned int)size, __func__, lflags); - congestion_wait(BLK_RW_ASYNC, HZ/50); - } while (1); + if (!(flags & KM_MAYFAIL)) { + lflags |= __GFP_NOFAIL; + lflags &= ~__GFP_NOWARN; + } + + return kmalloc(size, lflags); } From patchwork Fri Sep 17 02:56:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12500855 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFCDCC43217 for ; Fri, 17 Sep 2021 03:00:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A73C161019 for ; Fri, 17 Sep 2021 03:00:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243389AbhIQDBq (ORCPT ); Thu, 16 Sep 2021 23:01:46 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:56658 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243237AbhIQDBi (ORCPT ); Thu, 16 Sep 2021 23:01:38 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id DB335223BE; Fri, 17 Sep 2021 03:00:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1631847615; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VOYRJb9fJ+cMO6Hj0cgeXpg6pQoK6BH8BG5fqFk6i+M=; b=TRc7wKqSyLeB/iEALRI6D6YqRSG6iz+Dq+XLvWC0q61avt5RPD6l5eo+J9e2cm0RAlLndG bCR/i4dYjEnf68mBhKdVP1g+5WyLBU137hqqn9pFB7hOY9jQh9U2O/3XagwE7D9w6IZHhX bB+fyY7lriilGsSaYPwQeJPgnfg1Glk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1631847615; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VOYRJb9fJ+cMO6Hj0cgeXpg6pQoK6BH8BG5fqFk6i+M=; b=/lq6dz70EAyTVlxTdVKlR4Bn0HQwVD6XFi1dN/9iHfYMXcFuYHWhp0rAc/wmbsX9/3XzPy 3BMcXfR/dH2dKDAg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 732FE13D0B; Fri, 17 Sep 2021 03:00:11 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id X8J7DLsERGHIMwAAMHmgww (envelope-from ); Fri, 17 Sep 2021 03:00:11 +0000 Subject: [PATCH 6/6] XFS: remove congestion_wait() loop from xfs_buf_alloc_pages() From: NeilBrown To: Andrew Morton , Theodore Ts'o , Andreas Dilger , "Darrick J. Wong" , Matthew Wilcox , Mel Gorman , Michal Hocko , ". Dave Chinner" , Jonathan Corbet Cc: linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Date: Fri, 17 Sep 2021 12:56:57 +1000 Message-ID: <163184741782.29351.3052773526186670351.stgit@noble.brown> In-Reply-To: <163184698512.29351.4735492251524335974.stgit@noble.brown> References: <163184698512.29351.4735492251524335974.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Documentation commment in gfp.h discourages indefinite retry loops on ENOMEM and says of __GFP_NOFAIL that it is definitely preferable to use the flag rather than opencode endless loop around allocator. congestion_wait() is indistinguishable from schedule_timeout_uninterruptible() in practice and it is not a good way to wait for memory to become available. So add __GFP_NOFAIL to gfp if failure is not an option, and remove the congestion_wait(). We now only loop when failure is an option, and alloc_bulk_pages_array() made some progres, but not enough. Signed-off-by: NeilBrown --- fs/xfs/xfs_buf.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 5fa6cd947dd4..b19ab52c551b 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -352,7 +352,7 @@ xfs_buf_alloc_pages( if (flags & XBF_READ_AHEAD) gfp_mask |= __GFP_NORETRY; else - gfp_mask |= GFP_NOFS; + gfp_mask |= GFP_NOFS | __GFP_NOFAIL; /* Make sure that we have a page list */ bp->b_page_count = DIV_ROUND_UP(BBTOB(bp->b_length), PAGE_SIZE); @@ -372,8 +372,9 @@ xfs_buf_alloc_pages( /* * Bulk filling of pages can take multiple calls. Not filling the entire - * array is not an allocation failure, so don't back off if we get at - * least one extra page. + * array is not an allocation failure but is worth counting in + * xb_pages_retries statistics. If we don't even get one page, + * then this must be a READ_AHEAD and we should abort. */ for (;;) { long last = filled; @@ -385,16 +386,13 @@ xfs_buf_alloc_pages( break; } - if (filled != last) - continue; - - if (flags & XBF_READ_AHEAD) { + if (filled == last) { + ASSERT(flags & XBF_READ_AHEAD); xfs_buf_free_pages(bp); return -ENOMEM; } XFS_STATS_INC(bp->b_mount, xb_page_retries); - congestion_wait(BLK_RW_ASYNC, HZ / 50); } return 0; }