From patchwork Thu Nov 7 23:56:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13867326 Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 853181F4273 for ; Thu, 7 Nov 2024 23:56:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731023792; cv=none; b=LikpdRl2eTlJFuN+dZgC2sApZVxw5/5r71nMdFDPQz7tDb5Tr8C+PQP1K+p84phdx//q/t76NV1EYqoK6CzCInHZbFGd32ME0H6S2Oqilev0rLSM3MYNSqrx9+GsFOeebt/FczcXrJh71pGRvWqxfHRJiRp21+VWrdYKyQoMg7Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731023792; c=relaxed/simple; bh=pARaSFmLsoPSKNsrfH3+9P8D7C7sCUc7TrWkYNvCKqA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MPGL+DnMJCYEkqD9BEjTFGGjZqwtEfbYGhz34ojAHLbSwCx9aECYxhB4hTax+ZqRZ155RQCxP1tt6fptqa2394cNSAkFjPYQrlZ159h2IWmv3QMhfe371/RJJOPnSxmXczMvgQVwzHwoSophjnX7EU8u0s/Mt1Du60FeOmhvjHY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kWJqld7S; arc=none smtp.client-ip=209.85.128.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kWJqld7S" Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-6ea339a41f1so13628317b3.2 for ; Thu, 07 Nov 2024 15:56:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731023789; x=1731628589; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ikpYM/WWgMZzee5LK9ccNIkwppCcSDNp4r5PwHAnOS0=; b=kWJqld7StX4X+ko86QQDSXbUCh7phjDA9KqrFmG0uCQtsQk4LldybeuSA8UYIMt4Up +dAUcEBvsQddOs4/rD/2RUbECpxLWi73ra+tBFXB6P8MO8w7v+Km5J6wC4pENnGmHG0O Xseiyzb8IROfaw2B6r5TNT7avd2JLuVv2ha0Tdmo6Gia9i6eQL9iiDKK9HCo+jS0PwO4 CoY/ozYrWAmHBYi99mgya7I0my7OV+cIvtI9biaFz2oPuCAgNkv6exRaszoemUa0meDt UQVz8sdTyuqFct0EhUVtaClOPngEsgApyk99fG/W+C7GVSisCku/iE+MNbLwll298MRP M3ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731023789; x=1731628589; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ikpYM/WWgMZzee5LK9ccNIkwppCcSDNp4r5PwHAnOS0=; b=eeFtshJq2aWN/CCKEILmVgG5SYu6QGwRIO7eflqGFlpQ/kyJEv8U9EbrkKGhbBk4zf up6GARPbnFHQsfsUkWaCDEYa5L4oWEOxXeRL5T0Zr+NAz74o9eOt04fSe4A9pU6UhlMH nv31P5j0tDty9FB2CEoA55PccjOpoCy6LM5gSry0vpdbEx35o0EYSlOl1ysn5++H5WEb g/qDKoCTuM4piFu4zp6briGCvoQM3I/tH7fptNkC5szJt5PqzpBSMVMs2muf53/s3AW/ lAU5Z4fMKTbfzJrrARKUcgam8o/Gvndm1TtCl3urWPxtIPGkaMVqY9JQu6zFVG43gz5P tIBQ== X-Forwarded-Encrypted: i=1; AJvYcCWVmURAIRrKGUPPD/eYB9+IGlz8Z2UqQy5jIjkh/aO9/TWiu1ct+PyIHewcEzTP89clJjLGCKvEdiwa4utu@vger.kernel.org X-Gm-Message-State: AOJu0YzwMOrqtfc0wqFq3YnzCBa3msMgPzWgo15NawG5gL0fbVyO4og2 aAlp8MXYYfOOf438U13Gl7EPOlqqLRZKhqvlwTNoU9FT2XTP0AI0 X-Google-Smtp-Source: AGHT+IE1RwTd0+7DYupIY1PYmqJOKwOZCXsEe05QepgoB+3JP5jig5elKW2UZmO9v9UgGqA6cNuR5A== X-Received: by 2002:a05:690c:6886:b0:6e5:b6a7:1640 with SMTP id 00721157ae682-6eaddfa967fmr10156897b3.42.1731023789401; Thu, 07 Nov 2024 15:56:29 -0800 (PST) Received: from localhost (fwdproxy-nha-002.fbsv.net. [2a03:2880:25ff:2::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6eaceb7b234sm5007697b3.98.2024.11.07.15.56.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 15:56:29 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v4 1/6] mm: add AS_WRITEBACK_MAY_BLOCK mapping flag Date: Thu, 7 Nov 2024 15:56:09 -0800 Message-ID: <20241107235614.3637221-2-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241107235614.3637221-1-joannelkoong@gmail.com> References: <20241107235614.3637221-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add a new mapping flag AS_WRITEBACK_MAY_BLOCK which filesystems may set to indicate that writeback operations may block or take an indeterminate amount of time to complete. Extra caution should be taken when waiting on writeback for folios belonging to mappings where this flag is set. Signed-off-by: Joanne Koong --- include/linux/pagemap.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 68a5f1ff3301..eb5a7837e142 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -210,6 +210,7 @@ enum mapping_flags { AS_STABLE_WRITES = 7, /* must wait for writeback before modifying folio contents */ AS_INACCESSIBLE = 8, /* Do not attempt direct R/W access to the mapping */ + AS_WRITEBACK_MAY_BLOCK = 9, /* Use caution when waiting on writeback */ /* Bits 16-25 are used for FOLIO_ORDER */ AS_FOLIO_ORDER_BITS = 5, AS_FOLIO_ORDER_MIN = 16, @@ -335,6 +336,16 @@ static inline bool mapping_inaccessible(struct address_space *mapping) return test_bit(AS_INACCESSIBLE, &mapping->flags); } +static inline void mapping_set_writeback_may_block(struct address_space *mapping) +{ + set_bit(AS_WRITEBACK_MAY_BLOCK, &mapping->flags); +} + +static inline bool mapping_writeback_may_block(struct address_space *mapping) +{ + return test_bit(AS_WRITEBACK_MAY_BLOCK, &mapping->flags); +} + static inline gfp_t mapping_gfp_mask(struct address_space * mapping) { return mapping->gfp_mask; From patchwork Thu Nov 7 23:56:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13867327 Received: from mail-yw1-f182.google.com (mail-yw1-f182.google.com [209.85.128.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F33511F4264 for ; Thu, 7 Nov 2024 23:56:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731023793; cv=none; b=bBg13uMnKLpKJs0junr4BfkZVqZ0xOSDPSMK76KC6TOXamPiHH+DXcHAYH4AyN3LZdNU+Geq1+EA5M9k8FYT39LDlCxo68nV3aCk+cQiAieTtR9Kdi2vtkTXtnfVk/jgp5VdlTWtih6B8VtqydsRxuB7HxcbQEp3vsX/9+pyyjk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731023793; c=relaxed/simple; bh=TNpy65gaPB+tGczJKWkCHgIevd+Hc74bVMSwhGgGHDk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BtL18w+Gxld4vx/BXPoCLq+I5XxRDm/Lfviy7QpHPYpfFzaanWA6oLtoMQQT8bI6NSa7B46XqvWFl/VCtIaw+sj/r1EMXq7UxbnnRIEa+OvR4uW5fSLS9pWOenKkfQEX6wfyion8IGrQwE1iexwjpfs01FmIp08w3F1QlebT7/8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UTW21oxM; arc=none smtp.client-ip=209.85.128.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UTW21oxM" Received: by mail-yw1-f182.google.com with SMTP id 00721157ae682-6e5e5c43497so12747017b3.3 for ; Thu, 07 Nov 2024 15:56:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731023791; x=1731628591; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kGp398/OGKFadKTdBHGJCLRTWw479TJgb4XjwbwuJpk=; b=UTW21oxMWvh6mUInUxCecvr6EaxB5uCiAX3fc68Csh20QmuIxZ/y8eAKzAxRk9zPIG SehQv5SgG82zUnbj5t3CZumgB7uxHXtHv6HwQUNTkWOiaAijwURFaiqBiKLT9lqWrdKE TekjBX/rf5yaUFcyBymgblSkk9XSz7e/65gz+sAinaO9VPUFL7Je9WEX69QLuY0UZxGD JrgO7fVCaX1grHTDNRYp7VGMdvXULLzYClqGfnIukOvlfXf+Cqh5dkViuctrNWfMp+hZ WcaFXxZ3Vg8gTHreoLOrYPvCEUyl3KOlh7eKn4Gq5ZUKUcIdIJTiGOj5TDFTesCQy8Mb BieQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731023791; x=1731628591; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kGp398/OGKFadKTdBHGJCLRTWw479TJgb4XjwbwuJpk=; b=ANJ/MFFeh2hBHQFDT0MO/GKsFD14GwK8c52MTXue/5gWAcey67G+dZQBO3XT2Cq603 LYFhenngUd1v90+5v6QxgvXnFCTB9b3s2h6vyvFtfaNGfrye7L3jkBobpB1IfoJ9S45i 85GWoVQCdQBZYVzSp0Lg6OAovBwnrMccWUQrIeGc7a4+BS1PKCfUFojHrYe4Hc966Jon GDZ7OnTZJ07ymj2YVJ2oCu0NkFx0Cd9H/+l69Iv4h9zjILetvnEpk+WRm9mVIqJnFEAl doYdIgu5h9FzghiRA568nYVvF0/f0TsQqhslbk2H8zvNYxdZpfD9tq/ziMPuC3r0BrwN LgxA== X-Forwarded-Encrypted: i=1; AJvYcCU8Rn3St4w0Uug8Jn1KQ2Kdu0nVrDgHQd67k/+cq+z1KpHL+qFeK8yxbeYxfTRhevynLF/ppuyvV0R+5EU0@vger.kernel.org X-Gm-Message-State: AOJu0Ywvhtc6CJmnD7pm894svHvZxxGs7Wr19GNOldRWPmG9+nwsV2lS RguTuVAL+zv+8t2N9ZeVq6e5eEs59DKiWIqq75HkUBIJp3amrs9r X-Google-Smtp-Source: AGHT+IGtwXdTUWVou6799sIkHT1OBBdHpywl7x1G3IHz/NTXy8iOQgd0OjfXXOssI2uPEUYjyoyP1g== X-Received: by 2002:a05:690c:4b8d:b0:6e3:8ecc:bb0e with SMTP id 00721157ae682-6eaddd94248mr12819677b3.11.1731023790878; Thu, 07 Nov 2024 15:56:30 -0800 (PST) Received: from localhost (fwdproxy-nha-012.fbsv.net. [2a03:2880:25ff:c::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6eaceb6563bsm4999937b3.77.2024.11.07.15.56.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 15:56:30 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v4 2/6] mm: skip reclaiming folios in legacy memcg writeback contexts that may block Date: Thu, 7 Nov 2024 15:56:10 -0800 Message-ID: <20241107235614.3637221-3-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241107235614.3637221-1-joannelkoong@gmail.com> References: <20241107235614.3637221-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Currently in shrink_folio_list(), reclaim for folios under writeback falls into 3 different cases: 1) Reclaim is encountering an excessive number of folios under writeback and this folio has both the writeback and reclaim flags set 2) Dirty throttling is enabled (this happens if reclaim through cgroup is not enabled, if reclaim through cgroupv2 memcg is enabled, or if reclaim is on the root cgroup), or if the folio is not marked for immediate reclaim, or if the caller does not have __GFP_FS (or __GFP_IO if it's going to swap) set 3) Legacy cgroupv1 encounters a folio that already has the reclaim flag set and the caller did not have __GFP_FS (or __GFP_IO if swap) set In cases 1) and 2), we activate the folio and skip reclaiming it while in case 3), we wait for writeback to finish on the folio and then try to reclaim the folio again. In case 3, we wait on writeback because cgroupv1 does not have dirty folio throttling, as such this is a mitigation against the case where there are too many folios in writeback with nothing else to reclaim. The issue is that for filesystems where writeback may block, sub-optimal workarounds may need to be put in place to avoid a potential deadlock that may arise from reclaim waiting on writeback. (Even though case 3 above is rare given that legacy cgroupv1 is on its way to being deprecated, this case still needs to be accounted for). For example, for FUSE filesystems, a temp page gets allocated per dirty page and the contents of the dirty page are copied over to the temp page so that writeback can be immediately cleared on the dirty page in order to avoid the following deadlock: * single-threaded FUSE server is in the middle of handling a request that needs a memory allocation * memory allocation triggers direct reclaim * direct reclaim waits on a folio under writeback (eg falls into case 3 above) that needs to be written back to the FUSE server * the FUSE server can't write back the folio since it's stuck in direct reclaim In this commit, if legacy memcg encounters a folio with the reclaim flag set (eg case 3) and the folio belongs to a mapping that has the AS_WRITEBACK_MAY_BLOCK flag set, the folio will be activated and skip reclaim (eg default to behavior in case 2) instead. This allows for the suboptimal workarounds added to address the "reclaim wait on writeback" deadlock scenario to be removed. Signed-off-by: Joanne Koong --- mm/vmscan.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 749cdc110c74..e9755cb7211b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1110,6 +1110,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (writeback && folio_test_reclaim(folio)) stat->nr_congested += nr_pages; + mapping = folio_mapping(folio); + /* * If a folio at the tail of the LRU is under writeback, there * are three cases to consider. @@ -1129,8 +1131,9 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, * 2) Global or new memcg reclaim encounters a folio that is * not marked for immediate reclaim, or the caller does not * have __GFP_FS (or __GFP_IO if it's simply going to swap, - * not to fs). In this case mark the folio for immediate - * reclaim and continue scanning. + * not to fs), or writebacks in the mapping may block. + * In this case mark the folio for immediate reclaim and + * continue scanning. * * Require may_enter_fs() because we would wait on fs, which * may not have submitted I/O yet. And the loop driver might @@ -1165,7 +1168,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, /* Case 2 above */ } else if (writeback_throttling_sane(sc) || !folio_test_reclaim(folio) || - !may_enter_fs(folio, sc->gfp_mask)) { + !may_enter_fs(folio, sc->gfp_mask) || + (mapping && mapping_writeback_may_block(mapping))) { /* * This is slightly racy - * folio_end_writeback() might have From patchwork Thu Nov 7 23:56:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13867328 Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E3961F4273 for ; Thu, 7 Nov 2024 23:56:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731023794; cv=none; b=AreVpgO+K4Qe6mA+PVGxmisJpjM1KHVxvWQoD3zIcbVmzQzUXqFYacQxlcnK0ARWWkouIgjKvD8FkCdJzCuqwyrbI4OIZErJmbMFFKFRLJXs/VmlU4CZzijfwYA532gFRHG7esEldHCvBYYlvpxbYwXWjOfOlSJpe3un21pM74k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731023794; c=relaxed/simple; bh=Vyk5LdXUPCJ5WnYil+TLoGEb7TvxgkoZNZe/xBB3dWg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KiUIhRb6YpSrDFAtmswhBYK+Pygt9uf/4NLiyGnx6sv4IVcMfdIuhnxqODUykE7Zy0DqRApga5CSYEYqupCg0PEBktuzBRjHfvJYAWP8Shhlf0ThOWTrNeiPPvsgrdn3wUBe5srRLSbauhsk86gr3kfmt1ukKtMi3cmBKk/nNU8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JGqXlwi0; arc=none smtp.client-ip=209.85.128.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JGqXlwi0" Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-6e5a5a59094so14807757b3.3 for ; Thu, 07 Nov 2024 15:56:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731023792; x=1731628592; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZnCvLTbTmq6xS6wpFSTKmY1gePcY6efLtCHlkRWy6SY=; b=JGqXlwi0O4zl/F1DL3QvH0UHmisz/oWFxDOkNRrPTBLuvIhSG9khRzUVnmnd4cKYVL g2AlGixFvSpjzpz6F4xG4SCCYam2Wv9lH1yKylOmljNdjvzGTPdrA+3BtnI/0bHg8YZ0 DFCqgk8Wh+7rFwqZCGtpiGAmQx0DILrIIDDWA7ugpf4I86XJGlqlGW8lAw01Ay/uNfmQ JxQed3y2Pq0KflR7d93eZxWxv+oPEevtTYstxxGeXy1Q+tHsqGZEecgk9rgj2ILEUyoZ cYpKumDqSQjv2hUX+U+fh1kHnDyJkwLlolw1QJfaad3m23K2nEorNSCahu6/IAuEDjX1 tyjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731023792; x=1731628592; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZnCvLTbTmq6xS6wpFSTKmY1gePcY6efLtCHlkRWy6SY=; b=GbycBNCYc7vgLF8g3FzMSKUH/2hMZ3JOUOJ7/yf/7CWBQ1QiTFMQzduJ8DzZ3M4CZl nHVZFRqjg/8HXtbG6uwGzJJEZYrfGFuJxmAW7jLK9wnItfLcXzcDaUPEdd5hKfSWJ9aC uMdXzCF3UWZVhTFHMbI9Pbvh/9Xm5zG9q8XMbfKy2lA6Jf+4yNsDjhIHvQE9+PJBAazi rsrGl7YlWAoufSjwwOvrwXdSHVSP3Dr/qX+1aMn7OOkTKKtTy+XmLAYDjYCPdDkXeac6 Xu/RmGdykcrMLAxUrPcAGakFfEJ9FolbYXxcO2SmTa2Yc9wHEEt2ztf78H196vVF39LC c7Fw== X-Forwarded-Encrypted: i=1; AJvYcCU2VXtIfNT6Rg3bUAXYpGenUHJvaFv3fk6Ka/cVPRJgEOJ0gm7LwuGO3bKhsc8inCXya0Wfm/G/AAwbRig1@vger.kernel.org X-Gm-Message-State: AOJu0YwGn/+J+wxsgkFB1PA+KE0S5c4uuNMraOtF0JeWw0y5v6L71ga9 a5033aQo8i/JoBXHq/nhoiksmOHOh9qzPVV1IoWrBm8J6dkdojdt X-Google-Smtp-Source: AGHT+IGJ7XIGDH3gjwhwJwxg42yQL0oNlE1U9fJsXrX1XmblFbwOlYN4FwmU8RXv5pchaNWEiPA4dQ== X-Received: by 2002:a05:690c:6e03:b0:6e6:248:3496 with SMTP id 00721157ae682-6eaddd91bf5mr13056217b3.11.1731023792342; Thu, 07 Nov 2024 15:56:32 -0800 (PST) Received: from localhost (fwdproxy-nha-011.fbsv.net. [2a03:2880:25ff:b::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6eace8ee1cbsm4999917b3.29.2024.11.07.15.56.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 15:56:32 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v4 3/6] fs/writeback: in wait_sb_inodes(), skip wait for AS_WRITEBACK_MAY_BLOCK mappings Date: Thu, 7 Nov 2024 15:56:11 -0800 Message-ID: <20241107235614.3637221-4-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241107235614.3637221-1-joannelkoong@gmail.com> References: <20241107235614.3637221-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 For filesystems with the AS_WRITEBACK_MAY_BLOCK flag set, writeback operations may block or take an indeterminate time to complete. For example, writing data back to disk in FUSE filesystems depends on the userspace server successfully completing writeback. In this commit, wait_sb_inodes() skips waiting on writeback if the inode's mapping has AS_WRITEBACK_MAY_BLOCK set, else sync(2) may take an indeterminate amount of time to complete. If the caller wishes to ensure the data for a mapping with the AS_WRITEBACK_MAY_BLOCK flag set has actually been written back to disk, they should use fsync(2)/fdatasync(2) instead. Signed-off-by: Joanne Koong --- fs/fs-writeback.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index d8bec3c1bb1f..c80c45972162 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2659,6 +2659,9 @@ static void wait_sb_inodes(struct super_block *sb) if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) continue; + if (mapping_writeback_may_block(mapping)) + continue; + spin_unlock_irq(&sb->s_inode_wblist_lock); spin_lock(&inode->i_lock); From patchwork Thu Nov 7 23:56:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13867329 Received: from mail-yb1-f178.google.com (mail-yb1-f178.google.com [209.85.219.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4F5A1F4264 for ; Thu, 7 Nov 2024 23:56:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731023796; cv=none; b=EblvVnDAVtatApvUiVx0kPgF8P0jSE9m+oh7oDMc+CpIOY7nT11pW9z02rlkWoBNBGUSA8YFNy0A1Uu/h970fF+MAOl6iB4n0VEh1gq9XfpV3tQON+soz69dX26eejsjCz9XGcuWwteqDXXJhxwJvHASOHD8+EH+7E8VNlKIv7w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731023796; c=relaxed/simple; bh=3cbO7y0pACrtoUjmXCfJYrCv/UHydIb1Y40En1V5hgE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mcXtxwtEjzPeypuds4REeJ1VoDc5z/6Sj9mb5L6G9QQKyOOt2sJtSKzS6J9S11H9Y2GKb9acu3QoaUm7rK5TeiQCOZ0emmXKC63RaaJfW8paTfApJlpGBOKdw2THoPQBpr+9Z46rHEYPWI93E02ju883UlvSOf+T3RQNbuVjgoQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=cYJb8mu8; arc=none smtp.client-ip=209.85.219.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cYJb8mu8" Received: by mail-yb1-f178.google.com with SMTP id 3f1490d57ef6-e3314fc5aacso1510418276.3 for ; Thu, 07 Nov 2024 15:56:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731023794; x=1731628594; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=u2CBAn6giMXXiKhPOgLeKDCo9Vrjd3aYf+IrGQBfRgc=; b=cYJb8mu8Sh1CMdJcoeekOG5NTrNPs1htqOVIkbX4gRcPBCwBnauO/GM5neJYb37Rdv 6bS5OB+OX/8Oz5MC2IVx45zcjX5amu4uyxcn4rOqinPIJSNn4JvQ/TIDeQsiTkFXePtI 9RPY+QomUArnGFRUSbUwRaGcZgeV1e1Rspvv36nqOq+3Wpq6D+w/94tDCZ+TYmEWctQg 1+EayFuvlIhB2tgvPAO4yLF4C3Ow9UC59dP/L7CMM99kXndn/50wMN9frNmjhuC9ZYn9 5VJgvpWp7WDUQy5MXswfHujwBb9O4CjytwcAKp3wnYfnvItl7KG8tn9GHe9cZDaJ6APE A7aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731023794; x=1731628594; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u2CBAn6giMXXiKhPOgLeKDCo9Vrjd3aYf+IrGQBfRgc=; b=Pi/phNti9aEtJBWZjf+wbsm/zaFmAofk9h/8dynVF+ljgXE/bv2zEvGOwW+3HWxcZk /7fztUqXuxMeYt0irLZdd6sVfQxWcPopYZrT5zMa8vx1srpp4/oQQEA7NoXu1APKe0W7 Hj8NsKvnX0qnaXvMhU16i4TROJ5rjWTSbqfhvuh8dYV6/bIX+KkrXm1TznU716ndo5+X uPYbOrPKKzL4l1JljAwGuDC9BSUe5pCby/RJh0oa1lwtNZa3um95NaPnrq2OpJE91dwd 3PoU6u3ejVareyke2e/EnrjSws2DZ7FO2GbZPJ8mhszTow5r2QV+PIutZW3M9mTuaIti PeZA== X-Forwarded-Encrypted: i=1; AJvYcCU6yvgi6z5So5gfF5ydFtZAhWpUk0h7ArFIPFRt5tBn8BRWOP5inkdZUh7I7MCsHpcQMR3VmBU3h2tFRpEC@vger.kernel.org X-Gm-Message-State: AOJu0YxZIYl3SQAda66mnfa5pFbxJyR/w/rzV4YdEQzCwqWui3zrJVUo Jg02NkXcUEctkmHZw2+FWoIdzNxG0Bcdj/4kbrP5kB6K1Mp48EyM X-Google-Smtp-Source: AGHT+IEFzO2Zgam/gnWf8Abfux5tkvxODQuWALP1dfzNCjLg2EJGoSDqARXu4PrQApxNEjREq7Rfug== X-Received: by 2002:a05:690c:46c6:b0:6ea:95f5:25fb with SMTP id 00721157ae682-6eaddd704c0mr12260067b3.3.1731023793872; Thu, 07 Nov 2024 15:56:33 -0800 (PST) Received: from localhost (fwdproxy-nha-014.fbsv.net. [2a03:2880:25ff:e::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6eaceb09a6fsm5009617b3.68.2024.11.07.15.56.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 15:56:33 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v4 4/6] mm/memory-hotplug: add finite retries in offline_pages() if migration fails Date: Thu, 7 Nov 2024 15:56:12 -0800 Message-ID: <20241107235614.3637221-5-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241107235614.3637221-1-joannelkoong@gmail.com> References: <20241107235614.3637221-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In offline_pages(), do_migrate_range() may potentially retry forever if the migration fails. Add a return value for do_migrate_range(), and allow offline_page() to try migrating pages 5 times before erroring out, similar to how migration failures in __alloc_contig_migrate_range() is handled. Signed-off-by: Joanne Koong --- mm/memory_hotplug.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 621ae1015106..49402442ea3b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1770,13 +1770,14 @@ static int scan_movable_pages(unsigned long start, unsigned long end, return 0; } -static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) +static int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) { struct folio *folio; unsigned long pfn; LIST_HEAD(source); static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); + int ret = 0; for (pfn = start_pfn; pfn < end_pfn; pfn++) { struct page *page; @@ -1833,7 +1834,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL, .reason = MR_MEMORY_HOTPLUG, }; - int ret; /* * We have checked that migration range is on a single zone so @@ -1863,6 +1863,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) putback_movable_pages(&source); } } + return ret; } static int __init cmdline_parse_movable_node(char *p) @@ -1940,6 +1941,7 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, const int node = zone_to_nid(zone); unsigned long flags; struct memory_notify arg; + unsigned int tries = 0; char *reason; int ret; @@ -2028,11 +2030,8 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, ret = scan_movable_pages(pfn, end_pfn, &pfn); if (!ret) { - /* - * TODO: fatal migration failures should bail - * out - */ - do_migrate_range(pfn, end_pfn); + if (do_migrate_range(pfn, end_pfn) && ++tries == 5) + ret = -EBUSY; } } while (!ret); From patchwork Thu Nov 7 23:56:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13867330 Received: from mail-yb1-f174.google.com (mail-yb1-f174.google.com [209.85.219.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 465F11F4273 for ; Thu, 7 Nov 2024 23:56:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731023797; cv=none; b=dul3dXFB6GPy0LUjU6vgG3gi2KOubXl+KXKmhfFuTVGnX1YbW8Vsnl0hiTJqZL08bj+WXEJwyK4bSh+7h9Y/rHok17HAlTF3PeYinIrtf6ObAKGAxHuFVJtXfJPQ2tp/sgmgyPr/pOXy5KZf5laIFq2kTMcT0vulxF3IqXq2nM0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731023797; c=relaxed/simple; bh=TkAFk54d4ZFvcKTSnxnBXeZtlkayKYj0HI4okcukWRo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Wlr39GzBJbPMmNApZsKzflKzl8ndDxrCcHhtGs6EhJlSxYzvlu/R/ooxRzhlv30oq0raKOniwUT1Rqk/BF5MPSD/g5ZayTVDyJwz/RMKLN+NIAiueHAA34EXvOGJ6CI0TiSzzl5eR0sok+SRFKIGFY9ivOvYqXJuIUao2c7mEgw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZmD6uMax; arc=none smtp.client-ip=209.85.219.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZmD6uMax" Received: by mail-yb1-f174.google.com with SMTP id 3f1490d57ef6-e2e444e355fso2178798276.1 for ; Thu, 07 Nov 2024 15:56:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731023795; x=1731628595; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lr8JpF5lWbeUzT238/bHr55gxwNV9GQSeDBdKh6actU=; b=ZmD6uMaxiXk4YDZbSTNvQLQyVABeV4CY60Vb5uK5eFl8XJXpvfKUz/VM3mIudPvWTf ye75rSlanh4sYBHjUB8OYakvYneygMxlWhqhmrUYGrKRQwlvD9rPy+gmslsSWYqrjaSl HGd5a5V+7BPPXAXCNBBTWEI+VX8810k22MTAYovMqWN6SOOp4vVowh/Oyf3lNSuWr0IY B21rfPvNuYRHZfHlWIyZjoU+hxwURNBSQN/pMLK3la9Z3S/fsAyfOh5ZJMCpfeA8Ys9k tDo+tXybg3MDZGoX3SX5g/5BMtUVP5vqR0o2ZH7A2n9raxkcUTM+cnFvlWIv2cRTf7ll 0I8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731023795; x=1731628595; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lr8JpF5lWbeUzT238/bHr55gxwNV9GQSeDBdKh6actU=; b=QJM7ZCSpGteGLVWIQoHRB3IlCzNCf/OQ3u8uJkRQuRoJBDjGz+DHQlo2Gozc8lEsHm Uvb/axZ+dqDV/1rIco6VVENS7ExMBWLU9dD2fTViUf/Wlarvd06bichYwATivo+zfRfO iL6s3dyYrlTmSpy1vY3mwU6t+1L4xnjWNHqo+FcNxesdlpM0VMtV0ceY0UeHc5BqAsKl UQMvaC/hpre6+ECm6ULPseFQTwQEp3g/SGPdDtjGHm0AXFamzDKKRw4X+1cTMlB9dkMk CHHmeY5lB8hHSw4ac46aA+PG9id3fPnAoAputAn3SrAcQNlH993zTG0tp3ce+KLCY5EO +sow== X-Forwarded-Encrypted: i=1; AJvYcCVuylfvzELyJa1Wy7HcvakzG1SouQVk7yaJTeCqzK6abBsE4Foum/xuCU/9pslPaD94FHBWuLG9HoV/wQFy@vger.kernel.org X-Gm-Message-State: AOJu0YxV2fF03OVrJJL6xekKOmLvYdRToG51IxkIzYb2057B8bQNnO+4 +ZIDqAKd6hnOWEV+jTcwY3499/0f4SzrVlStgTrGJIi3B1ljxQ9PQLU4sQ== X-Google-Smtp-Source: AGHT+IGOIT3+Xy+DzAbYdTcJrTAKRDTg3sonybdp7wxjtEbgM1bVGNx6YfgaqkWqC5P3MjLQHVppoA== X-Received: by 2002:a05:690c:d8f:b0:6e3:ceb:9e5c with SMTP id 00721157ae682-6eade52f876mr6909907b3.17.1731023795158; Thu, 07 Nov 2024 15:56:35 -0800 (PST) Received: from localhost (fwdproxy-nha-014.fbsv.net. [2a03:2880:25ff:e::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6eaceb96901sm4840707b3.131.2024.11.07.15.56.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 15:56:34 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v4 5/6] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_MAY_BLOCK mappings Date: Thu, 7 Nov 2024 15:56:13 -0800 Message-ID: <20241107235614.3637221-6-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241107235614.3637221-1-joannelkoong@gmail.com> References: <20241107235614.3637221-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 For migrations called in MIGRATE_SYNC mode, skip migrating the folio if it is under writeback and has the AS_WRITEBACK_MAY_BLOCK flag set on its mapping. If the AS_WRITEBACK_MAY_BLOCK flag is set on the mapping, the writeback may take an indeterminate amount of time to complete, so we should not wait. Signed-off-by: Joanne Koong --- mm/migrate.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index df91248755e4..1d038a4202ae 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1260,7 +1260,10 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, */ switch (mode) { case MIGRATE_SYNC: - break; + if (!src->mapping || + !mapping_writeback_may_block(src->mapping)) + break; + fallthrough; default: rc = -EBUSY; goto out; From patchwork Thu Nov 7 23:56:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13867331 Received: from mail-yb1-f177.google.com (mail-yb1-f177.google.com [209.85.219.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59DC11F6675 for ; Thu, 7 Nov 2024 23:56:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731023800; cv=none; b=qVsi6LXn6KwT5bi6hdU30xidlZeGxeI6xZ0QPHeRZgqThyGlSJ1+pdaLLIkZUochZCGhAFNDLQMUftOTqlX3Kqv9uqSql+z50F0tku6MNjBu57qNEEuhcULyVAzKAFnHs8KtMau15CAhgyzablsB9SoNTBdrjAtMPJ2cZKU4Q7A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731023800; c=relaxed/simple; bh=gxzH/zZ0xg+D95/YLzDiy42AV0eVsva1GyQkIKY3htQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dKOu8+v4pHRyIPD1ZRPPlTpW1Gn4kvYx1KbEFrPht8t9xVy/cMozfD7u2EPcMt4pjcgr3B6zF52Uzbnm3OkZGuuCSscXnlNvKrhSXDMkA55+vP/Al3/7qA7NgShatj3gtA1LoCEmLHb82GrpaonJHNfRyDwa3Q4UxE0bxN2n+IU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=WoE1ykXf; arc=none smtp.client-ip=209.85.219.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WoE1ykXf" Received: by mail-yb1-f177.google.com with SMTP id 3f1490d57ef6-e30d517c82fso1624663276.1 for ; Thu, 07 Nov 2024 15:56:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731023797; x=1731628597; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=K2zQ4cbFB7mO03fKPcDd4nTcilpBp/JPN/aDuEpSydU=; b=WoE1ykXfxrN9CXb3vCMObO+7ah1O+ayX6FzwHYLvGQ0JGqC2t68e0SDu+l7Fl2YjPu yFIHBfxEWD5o8BKqkfC0P3sGE8Yg64OO1vktuGarl8iOrqEHVNi7ygAgh3iPeas67Hw4 o5nZp47QCeyIndbIVzuUbSuwONx4ETki+8x8HMZqkFA0w06ABwA+HXaeg5ath2D1k81+ Dp/q2tNC1fZbgk80P1JdLTYSvRH8g/h9ur81ibgrveFJTdkV4SbmmMCcaQJhk+RedKfU fN97Z3bofTBmaDem982xTklRuB9aztdCVjoBhUVBXCPJO3f+FSssdGW1374KIaSgZGGF UMmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731023797; x=1731628597; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K2zQ4cbFB7mO03fKPcDd4nTcilpBp/JPN/aDuEpSydU=; b=MqL1P1lUQEgKD2pHTStBNJYt2siTz0/DCOdF/CUBJ+EdV5LmmyEuQB06DZc6u/1yd/ rAtpsl6vp2Gc9E0YeEgNamr6w7Dv3vlISkIO2QakxpEyOdfk+d3o1a0nIc71DFBG4xGy WGd9lp9zSty0dLOVbWzOjcl/Icg++GnubEqm+BHOY/VBKkgfFwwL7mXWjiXbqDJRHiKP K27hlFLvVGeZL8Qoch2RIoEyTIkDCor8phjNKYgzpgtMa4AZHOiElpXicb8VVbevWU4L GWXRbPlK0B6RaLb/v5jqozSynbu0iBHej2MTVC44UUWeC6aZql2HdMtvhxE7jPe9yvY5 yvZg== X-Forwarded-Encrypted: i=1; AJvYcCUOzyrUj9LVWbJRGPC82QLoCHsh0CLSHp7YF5yrahQGfR07qIEqw4Q4oCi0B6wpMs9DDuEjbBtBod11foHq@vger.kernel.org X-Gm-Message-State: AOJu0YzRQfLnCGzCC84iaueuUZk0CmvU52wo012q6SXie6E7EuInC0FE DPYmOXm4o65HKBw4dlv3qDlCPq/Q0RRTCMrCWNo7Nt+mIJLlcbbk X-Google-Smtp-Source: AGHT+IE9t5iIJx0IaACQdPkFtV/8bJVKL4KVKCbG5Ac7SyO1FbJ7WNLKXvJDB23wfO80Nf5v/CwNSg== X-Received: by 2002:a05:6902:1026:b0:e2b:dd34:9a3f with SMTP id 3f1490d57ef6-e337f84a377mr1030919276.2.1731023797208; Thu, 07 Nov 2024 15:56:37 -0800 (PST) Received: from localhost (fwdproxy-nha-114.fbsv.net. [2a03:2880:25ff:72::face:b00c]) by smtp.gmail.com with ESMTPSA id 3f1490d57ef6-e336f1ba68dsm477764276.48.2024.11.07.15.56.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 15:56:36 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v4 6/6] fuse: remove tmp folio for writebacks and internal rb tree Date: Thu, 7 Nov 2024 15:56:14 -0800 Message-ID: <20241107235614.3637221-7-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241107235614.3637221-1-joannelkoong@gmail.com> References: <20241107235614.3637221-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Currently, we allocate and copy data to a temporary folio when handling writeback in order to mitigate the following deadlock scenario that may arise if reclaim waits on writeback to complete: * single-threaded FUSE server is in the middle of handling a request that needs a memory allocation * memory allocation triggers direct reclaim * direct reclaim waits on a folio under writeback * the FUSE server can't write back the folio since it's stuck in direct reclaim To work around this, we allocate a temporary folio and copy over the original folio to the temporary folio so that writeback can be immediately cleared on the original folio. This additionally requires us to maintain an internal rb tree to keep track of writeback state on the temporary folios. A recent change prevents reclaim logic from waiting on writeback for folios whose mappings have the AS_WRITEBACK_MAY_BLOCK flag set in it. This commit sets AS_WRITEBACK_MAY_BLOCK on FUSE inode mappings (which will prevent FUSE folios from running into the reclaim deadlock described above) and removes the temporary folio + extra copying and the internal rb tree. fio benchmarks -- (using averages observed from 10 runs, throwing away outliers) Setup: sudo mount -t tmpfs -o size=30G tmpfs ~/tmp_mount ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=4 -o source=~/tmp_mount ~/fuse_mount fio --name=writeback --ioengine=sync --rw=write --bs={1k,4k,1M} --size=2G --numjobs=2 --ramp_time=30 --group_reporting=1 --directory=/root/fuse_mount bs = 1k 4k 1M Before 351 MiB/s 1818 MiB/s 1851 MiB/s After 341 MiB/s 2246 MiB/s 2685 MiB/s % diff -3% 23% 45% Signed-off-by: Joanne Koong --- fs/fuse/file.c | 339 +++++-------------------------------------------- 1 file changed, 29 insertions(+), 310 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 88d0946b5bc9..f8719d8c56ca 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -415,89 +415,11 @@ u64 fuse_lock_owner_id(struct fuse_conn *fc, fl_owner_t id) struct fuse_writepage_args { struct fuse_io_args ia; - struct rb_node writepages_entry; struct list_head queue_entry; - struct fuse_writepage_args *next; struct inode *inode; struct fuse_sync_bucket *bucket; }; -static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi, - pgoff_t idx_from, pgoff_t idx_to) -{ - struct rb_node *n; - - n = fi->writepages.rb_node; - - while (n) { - struct fuse_writepage_args *wpa; - pgoff_t curr_index; - - wpa = rb_entry(n, struct fuse_writepage_args, writepages_entry); - WARN_ON(get_fuse_inode(wpa->inode) != fi); - curr_index = wpa->ia.write.in.offset >> PAGE_SHIFT; - if (idx_from >= curr_index + wpa->ia.ap.num_folios) - n = n->rb_right; - else if (idx_to < curr_index) - n = n->rb_left; - else - return wpa; - } - return NULL; -} - -/* - * Check if any page in a range is under writeback - */ -static bool fuse_range_is_writeback(struct inode *inode, pgoff_t idx_from, - pgoff_t idx_to) -{ - struct fuse_inode *fi = get_fuse_inode(inode); - bool found; - - if (RB_EMPTY_ROOT(&fi->writepages)) - return false; - - spin_lock(&fi->lock); - found = fuse_find_writeback(fi, idx_from, idx_to); - spin_unlock(&fi->lock); - - return found; -} - -static inline bool fuse_page_is_writeback(struct inode *inode, pgoff_t index) -{ - return fuse_range_is_writeback(inode, index, index); -} - -/* - * Wait for page writeback to be completed. - * - * Since fuse doesn't rely on the VM writeback tracking, this has to - * use some other means. - */ -static void fuse_wait_on_page_writeback(struct inode *inode, pgoff_t index) -{ - struct fuse_inode *fi = get_fuse_inode(inode); - - wait_event(fi->page_waitq, !fuse_page_is_writeback(inode, index)); -} - -static inline bool fuse_folio_is_writeback(struct inode *inode, - struct folio *folio) -{ - pgoff_t last = folio_next_index(folio) - 1; - return fuse_range_is_writeback(inode, folio_index(folio), last); -} - -static void fuse_wait_on_folio_writeback(struct inode *inode, - struct folio *folio) -{ - struct fuse_inode *fi = get_fuse_inode(inode); - - wait_event(fi->page_waitq, !fuse_folio_is_writeback(inode, folio)); -} - /* * Wait for all pending writepages on the inode to finish. * @@ -891,7 +813,7 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio) * have writeback that extends beyond the lifetime of the folio. So * make sure we read a properly synced folio. */ - fuse_wait_on_folio_writeback(inode, folio); + folio_wait_writeback(folio); attr_ver = fuse_get_attr_version(fm->fc); @@ -1003,16 +925,15 @@ static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file) static void fuse_readahead(struct readahead_control *rac) { struct inode *inode = rac->mapping->host; - struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_conn *fc = get_fuse_conn(inode); unsigned int max_pages, nr_pages; - pgoff_t first = readahead_index(rac); - pgoff_t last = first + readahead_count(rac) - 1; + loff_t first = readahead_pos(rac); + loff_t last = first + readahead_length(rac) - 1; if (fuse_is_bad(inode)) return; - wait_event(fi->page_waitq, !fuse_range_is_writeback(inode, first, last)); + filemap_fdatawait_range(inode->i_mapping, first, last); max_pages = min_t(unsigned int, fc->max_pages, fc->max_read / PAGE_SIZE); @@ -1172,7 +1093,7 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia, int err; for (i = 0; i < ap->num_folios; i++) - fuse_wait_on_folio_writeback(inode, ap->folios[i]); + folio_wait_writeback(ap->folios[i]); fuse_write_args_fill(ia, ff, pos, count); ia->write.in.flags = fuse_write_flags(iocb); @@ -1622,7 +1543,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, return res; } } - if (!cuse && fuse_range_is_writeback(inode, idx_from, idx_to)) { + if (!cuse && filemap_range_has_writeback(mapping, pos, (pos + count - 1))) { if (!write) inode_lock(inode); fuse_sync_writes(inode); @@ -1824,8 +1745,10 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) if (wpa->bucket) fuse_sync_bucket_dec(wpa->bucket); - for (i = 0; i < ap->num_folios; i++) + for (i = 0; i < ap->num_folios; i++) { + folio_end_writeback(ap->folios[i]); folio_put(ap->folios[i]); + } fuse_file_put(wpa->ia.ff, false); @@ -1838,7 +1761,7 @@ static void fuse_writepage_finish_stat(struct inode *inode, struct folio *folio) struct backing_dev_info *bdi = inode_to_bdi(inode); dec_wb_stat(&bdi->wb, WB_WRITEBACK); - node_stat_sub_folio(folio, NR_WRITEBACK_TEMP); + node_stat_sub_folio(folio, NR_WRITEBACK); wb_writeout_inc(&bdi->wb); } @@ -1861,7 +1784,6 @@ static void fuse_send_writepage(struct fuse_mount *fm, __releases(fi->lock) __acquires(fi->lock) { - struct fuse_writepage_args *aux, *next; struct fuse_inode *fi = get_fuse_inode(wpa->inode); struct fuse_write_in *inarg = &wpa->ia.write.in; struct fuse_args *args = &wpa->ia.ap.args; @@ -1898,19 +1820,8 @@ __acquires(fi->lock) out_free: fi->writectr--; - rb_erase(&wpa->writepages_entry, &fi->writepages); fuse_writepage_finish(wpa); spin_unlock(&fi->lock); - - /* After rb_erase() aux request list is private */ - for (aux = wpa->next; aux; aux = next) { - next = aux->next; - aux->next = NULL; - fuse_writepage_finish_stat(aux->inode, - aux->ia.ap.folios[0]); - fuse_writepage_free(aux); - } - fuse_writepage_free(wpa); spin_lock(&fi->lock); } @@ -1938,43 +1849,6 @@ __acquires(fi->lock) } } -static struct fuse_writepage_args *fuse_insert_writeback(struct rb_root *root, - struct fuse_writepage_args *wpa) -{ - pgoff_t idx_from = wpa->ia.write.in.offset >> PAGE_SHIFT; - pgoff_t idx_to = idx_from + wpa->ia.ap.num_folios - 1; - struct rb_node **p = &root->rb_node; - struct rb_node *parent = NULL; - - WARN_ON(!wpa->ia.ap.num_folios); - while (*p) { - struct fuse_writepage_args *curr; - pgoff_t curr_index; - - parent = *p; - curr = rb_entry(parent, struct fuse_writepage_args, - writepages_entry); - WARN_ON(curr->inode != wpa->inode); - curr_index = curr->ia.write.in.offset >> PAGE_SHIFT; - - if (idx_from >= curr_index + curr->ia.ap.num_folios) - p = &(*p)->rb_right; - else if (idx_to < curr_index) - p = &(*p)->rb_left; - else - return curr; - } - - rb_link_node(&wpa->writepages_entry, parent, p); - rb_insert_color(&wpa->writepages_entry, root); - return NULL; -} - -static void tree_insert(struct rb_root *root, struct fuse_writepage_args *wpa) -{ - WARN_ON(fuse_insert_writeback(root, wpa)); -} - static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args *args, int error) { @@ -1994,41 +1868,6 @@ static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args *args, if (!fc->writeback_cache) fuse_invalidate_attr_mask(inode, FUSE_STATX_MODIFY); spin_lock(&fi->lock); - rb_erase(&wpa->writepages_entry, &fi->writepages); - while (wpa->next) { - struct fuse_mount *fm = get_fuse_mount(inode); - struct fuse_write_in *inarg = &wpa->ia.write.in; - struct fuse_writepage_args *next = wpa->next; - - wpa->next = next->next; - next->next = NULL; - tree_insert(&fi->writepages, next); - - /* - * Skip fuse_flush_writepages() to make it easy to crop requests - * based on primary request size. - * - * 1st case (trivial): there are no concurrent activities using - * fuse_set/release_nowrite. Then we're on safe side because - * fuse_flush_writepages() would call fuse_send_writepage() - * anyway. - * - * 2nd case: someone called fuse_set_nowrite and it is waiting - * now for completion of all in-flight requests. This happens - * rarely and no more than once per page, so this should be - * okay. - * - * 3rd case: someone (e.g. fuse_do_setattr()) is in the middle - * of fuse_set_nowrite..fuse_release_nowrite section. The fact - * that fuse_set_nowrite returned implies that all in-flight - * requests were completed along with all of their secondary - * requests. Further primary requests are blocked by negative - * writectr. Hence there cannot be any in-flight requests and - * no invocations of fuse_writepage_end() while we're in - * fuse_set_nowrite..fuse_release_nowrite section. - */ - fuse_send_writepage(fm, next, inarg->offset + inarg->size); - } fi->writectr--; fuse_writepage_finish(wpa); spin_unlock(&fi->lock); @@ -2115,19 +1954,18 @@ static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, } static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struct folio *folio, - struct folio *tmp_folio, uint32_t folio_index) + uint32_t folio_index) { struct inode *inode = folio->mapping->host; struct fuse_args_pages *ap = &wpa->ia.ap; - folio_copy(tmp_folio, folio); - - ap->folios[folio_index] = tmp_folio; + folio_get(folio); + ap->folios[folio_index] = folio; ap->descs[folio_index].offset = 0; ap->descs[folio_index].length = PAGE_SIZE; inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK); - node_stat_add_folio(tmp_folio, NR_WRITEBACK_TEMP); + node_stat_add_folio(folio, NR_WRITEBACK); } static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio, @@ -2162,18 +2000,12 @@ static int fuse_writepage_locked(struct folio *folio) struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_writepage_args *wpa; struct fuse_args_pages *ap; - struct folio *tmp_folio; struct fuse_file *ff; - int error = -ENOMEM; - - tmp_folio = folio_alloc(GFP_NOFS | __GFP_HIGHMEM, 0); - if (!tmp_folio) - goto err; + int error = -EIO; - error = -EIO; ff = fuse_write_file_get(fi); if (!ff) - goto err_nofile; + goto err; wpa = fuse_writepage_args_setup(folio, ff); error = -ENOMEM; @@ -2184,22 +2016,17 @@ static int fuse_writepage_locked(struct folio *folio) ap->num_folios = 1; folio_start_writeback(folio); - fuse_writepage_args_page_fill(wpa, folio, tmp_folio, 0); + fuse_writepage_args_page_fill(wpa, folio, 0); spin_lock(&fi->lock); - tree_insert(&fi->writepages, wpa); list_add_tail(&wpa->queue_entry, &fi->queued_writes); fuse_flush_writepages(inode); spin_unlock(&fi->lock); - folio_end_writeback(folio); - return 0; err_writepage_args: fuse_file_put(ff, false); -err_nofile: - folio_put(tmp_folio); err: mapping_set_error(folio->mapping, error); return error; @@ -2209,7 +2036,6 @@ struct fuse_fill_wb_data { struct fuse_writepage_args *wpa; struct fuse_file *ff; struct inode *inode; - struct folio **orig_folios; unsigned int max_folios; }; @@ -2244,69 +2070,11 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data) struct fuse_writepage_args *wpa = data->wpa; struct inode *inode = data->inode; struct fuse_inode *fi = get_fuse_inode(inode); - int num_folios = wpa->ia.ap.num_folios; - int i; spin_lock(&fi->lock); list_add_tail(&wpa->queue_entry, &fi->queued_writes); fuse_flush_writepages(inode); spin_unlock(&fi->lock); - - for (i = 0; i < num_folios; i++) - folio_end_writeback(data->orig_folios[i]); -} - -/* - * Check under fi->lock if the page is under writeback, and insert it onto the - * rb_tree if not. Otherwise iterate auxiliary write requests, to see if there's - * one already added for a page at this offset. If there's none, then insert - * this new request onto the auxiliary list, otherwise reuse the existing one by - * swapping the new temp page with the old one. - */ -static bool fuse_writepage_add(struct fuse_writepage_args *new_wpa, - struct folio *folio) -{ - struct fuse_inode *fi = get_fuse_inode(new_wpa->inode); - struct fuse_writepage_args *tmp; - struct fuse_writepage_args *old_wpa; - struct fuse_args_pages *new_ap = &new_wpa->ia.ap; - - WARN_ON(new_ap->num_folios != 0); - new_ap->num_folios = 1; - - spin_lock(&fi->lock); - old_wpa = fuse_insert_writeback(&fi->writepages, new_wpa); - if (!old_wpa) { - spin_unlock(&fi->lock); - return true; - } - - for (tmp = old_wpa->next; tmp; tmp = tmp->next) { - pgoff_t curr_index; - - WARN_ON(tmp->inode != new_wpa->inode); - curr_index = tmp->ia.write.in.offset >> PAGE_SHIFT; - if (curr_index == folio->index) { - WARN_ON(tmp->ia.ap.num_folios != 1); - swap(tmp->ia.ap.folios[0], new_ap->folios[0]); - break; - } - } - - if (!tmp) { - new_wpa->next = old_wpa->next; - old_wpa->next = new_wpa; - } - - spin_unlock(&fi->lock); - - if (tmp) { - fuse_writepage_finish_stat(new_wpa->inode, - folio); - fuse_writepage_free(new_wpa); - } - - return false; } static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio, @@ -2315,15 +2083,6 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio, { WARN_ON(!ap->num_folios); - /* - * Being under writeback is unlikely but possible. For example direct - * read to an mmaped fuse file will set the page dirty twice; once when - * the pages are faulted with get_user_pages(), and then after the read - * completed. - */ - if (fuse_folio_is_writeback(data->inode, folio)) - return true; - /* Reached max pages */ if (ap->num_folios == fc->max_pages) return true; @@ -2333,7 +2092,7 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio, return true; /* Discontinuity */ - if (data->orig_folios[ap->num_folios - 1]->index + 1 != folio_index(folio)) + if (ap->folios[ap->num_folios - 1]->index + 1 != folio_index(folio)) return true; /* Need to grow the pages array? If so, did the expansion fail? */ @@ -2352,7 +2111,6 @@ static int fuse_writepages_fill(struct folio *folio, struct inode *inode = data->inode; struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_conn *fc = get_fuse_conn(inode); - struct folio *tmp_folio; int err; if (!data->ff) { @@ -2367,54 +2125,23 @@ static int fuse_writepages_fill(struct folio *folio, data->wpa = NULL; } - err = -ENOMEM; - tmp_folio = folio_alloc(GFP_NOFS | __GFP_HIGHMEM, 0); - if (!tmp_folio) - goto out_unlock; - - /* - * The page must not be redirtied until the writeout is completed - * (i.e. userspace has sent a reply to the write request). Otherwise - * there could be more than one temporary page instance for each real - * page. - * - * This is ensured by holding the page lock in page_mkwrite() while - * checking fuse_page_is_writeback(). We already hold the page lock - * since clear_page_dirty_for_io() and keep it held until we add the - * request to the fi->writepages list and increment ap->num_folios. - * After this fuse_page_is_writeback() will indicate that the page is - * under writeback, so we can release the page lock. - */ if (data->wpa == NULL) { err = -ENOMEM; wpa = fuse_writepage_args_setup(folio, data->ff); - if (!wpa) { - folio_put(tmp_folio); + if (!wpa) goto out_unlock; - } fuse_file_get(wpa->ia.ff); data->max_folios = 1; ap = &wpa->ia.ap; } folio_start_writeback(folio); - fuse_writepage_args_page_fill(wpa, folio, tmp_folio, ap->num_folios); - data->orig_folios[ap->num_folios] = folio; + fuse_writepage_args_page_fill(wpa, folio, ap->num_folios); err = 0; - if (data->wpa) { - /* - * Protected by fi->lock against concurrent access by - * fuse_page_is_writeback(). - */ - spin_lock(&fi->lock); - ap->num_folios++; - spin_unlock(&fi->lock); - } else if (fuse_writepage_add(wpa, folio)) { + ap->num_folios++; + if (!data->wpa) data->wpa = wpa; - } else { - folio_end_writeback(folio); - } out_unlock: folio_unlock(folio); @@ -2441,13 +2168,6 @@ static int fuse_writepages(struct address_space *mapping, data.wpa = NULL; data.ff = NULL; - err = -ENOMEM; - data.orig_folios = kcalloc(fc->max_pages, - sizeof(struct folio *), - GFP_NOFS); - if (!data.orig_folios) - goto out; - err = write_cache_pages(mapping, wbc, fuse_writepages_fill, &data); if (data.wpa) { WARN_ON(!data.wpa->ia.ap.num_folios); @@ -2456,7 +2176,6 @@ static int fuse_writepages(struct address_space *mapping, if (data.ff) fuse_file_put(data.ff, false); - kfree(data.orig_folios); out: return err; } @@ -2481,7 +2200,7 @@ static int fuse_write_begin(struct file *file, struct address_space *mapping, if (IS_ERR(folio)) goto error; - fuse_wait_on_page_writeback(mapping->host, folio->index); + folio_wait_writeback(folio); if (folio_test_uptodate(folio) || len >= folio_size(folio)) goto success; @@ -2545,13 +2264,11 @@ static int fuse_launder_folio(struct folio *folio) { int err = 0; if (folio_clear_dirty_for_io(folio)) { - struct inode *inode = folio->mapping->host; - /* Serialize with pending writeback for the same page */ - fuse_wait_on_page_writeback(inode, folio->index); + folio_wait_writeback(folio); err = fuse_writepage_locked(folio); if (!err) - fuse_wait_on_page_writeback(inode, folio->index); + folio_wait_writeback(folio); } return err; } @@ -2595,7 +2312,7 @@ static vm_fault_t fuse_page_mkwrite(struct vm_fault *vmf) return VM_FAULT_NOPAGE; } - fuse_wait_on_folio_writeback(inode, folio); + folio_wait_writeback(folio); return VM_FAULT_LOCKED; } @@ -3413,9 +3130,12 @@ static const struct address_space_operations fuse_file_aops = { void fuse_init_file_inode(struct inode *inode, unsigned int flags) { struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_conn *fc = get_fuse_conn(inode); inode->i_fop = &fuse_file_operations; inode->i_data.a_ops = &fuse_file_aops; + if (fc->writeback_cache) + mapping_set_writeback_may_block(&inode->i_data); INIT_LIST_HEAD(&fi->write_files); INIT_LIST_HEAD(&fi->queued_writes); @@ -3423,7 +3143,6 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags) fi->iocachectr = 0; init_waitqueue_head(&fi->page_waitq); init_waitqueue_head(&fi->direct_io_waitq); - fi->writepages = RB_ROOT; if (IS_ENABLED(CONFIG_FUSE_DAX)) fuse_dax_inode_init(inode, flags);