From patchwork Thu Nov 7 19:16:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13866959 Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73A77194AD6 for ; Thu, 7 Nov 2024 19:17:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731007046; cv=none; b=FzyYYrMV99R4QTKhUxqAxLpuGiTK5vqCdSiR2Gw5Qb/Uvz9mTQLHiAJuXFYltNTq5T3NUguO83e/MpMnMvtrKIg992EGX8aMkZu4VcbVRTkEpgZQ1phhRZPV9aAFj5iJcF9gCSh/Q2tkm/Q0ZL9U9o+KOcEPYpWBIRWfNq6JUhw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731007046; c=relaxed/simple; bh=pARaSFmLsoPSKNsrfH3+9P8D7C7sCUc7TrWkYNvCKqA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nAdC3jd9TdHywlncH7vSh5TGy9zDQ7CFxjDf1GNgWZaTZi9Sg7st735hjMZNq6ZwxvNCBmwn4rUz3T4VJ1i/WOOE/LVW92wjUPJLu+x+227lT7M0twgJP8KXWiD+fwz8qV7WI3oFlDl7uJ2nR7ZRjPzjCb32ydPBDOEsoMK+wHI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bcNRjv1q; arc=none smtp.client-ip=209.85.128.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bcNRjv1q" Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-6e3cdbc25a0so14516787b3.2 for ; Thu, 07 Nov 2024 11:17:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731007043; x=1731611843; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ikpYM/WWgMZzee5LK9ccNIkwppCcSDNp4r5PwHAnOS0=; b=bcNRjv1qO6xBmMigO2iAQ8oZDrvlIbsdokMCnm7Va91oQYPHfMZ7EYdPaM45m72zrB UGfHLXwfU1v2ePuZqBjSqctetnO+ma6sxKXuMnvmHoLOfmFJiumEIbQzHDAtTPpbpix7 YeMItXEJgNQtyI5c+OdXlTXFV0mNMARv/Pr9LhYfLCblXfEvIQiyLlNLIMGxJIwiBQOj euuKvOOs5l/ezR2Ln+KFXfz/P2KUNeOeQkTkuQ/rZogdIU29ob9+dUvDLeBNMPEqbTen zOY6fE/ROR3z9pvdhk3fLfd+PEzVBfI1oICBV/UN9yG2RSkE3Hx2s+AEqTd/Bv35R9nK 8now== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731007043; x=1731611843; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ikpYM/WWgMZzee5LK9ccNIkwppCcSDNp4r5PwHAnOS0=; b=Y4Bh+McSWazivRAhUkZTl/dow9LYSRIDsF7TdQALsMC9pSw7fPcaDyLzJB2WivFmBy jPwvzjeFfxtJD2aatuFtaDoejMIr4Q/fKDF0un2yZRPflHHhmMAB80pyI2M+s4/YYJmd hF5TqnKMMA1nVqKoWZHyNzVphUjzT73rgsR9ZfrRyVGjlSOyQKPaO8BBO/TGFRPIaJ+H KIfSZzBEl/yB8uV2oRMUYM+pwc5dk7ajTUSkKPWq+vi/09s++SH8GsaaeL3TkvtI38QW y2pLn3InqZfiIqs4XxovyTeVj/RYTCdgBD2hbXg0FCs98sLCgyHu+cNrtXqnWJoMugzr mfJg== X-Forwarded-Encrypted: i=1; AJvYcCW3wsZqIpGmz6tYsS+TWfqpNgTCkaeRmREOua7VMasr+y50njqshooEiiFdgZXJclsZEl9QCR2jZUxUDiOx@vger.kernel.org X-Gm-Message-State: AOJu0Yw3J2I8B0RpiMjATXZtkDlnMwc0BoBZ0f48z83rIIV/3/tSwHhp HWRsHE/1FSkrxSgv9XPAVqjj+uWvPA03BN+jBIEH0v6qZe5IM74e X-Google-Smtp-Source: AGHT+IFHssVnIsuRtflrMUnmonDOpNMTsU28VnY3e7vVVJcpvOBurpk0sydP4Cx4Ik3eP3kEc+Ijgw== X-Received: by 2002:a05:690c:67c8:b0:6e3:323f:d8fb with SMTP id 00721157ae682-6eaddd994d5mr929977b3.14.1731007043403; Thu, 07 Nov 2024 11:17:23 -0800 (PST) Received: from localhost (fwdproxy-nha-114.fbsv.net. [2a03:2880:25ff:72::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6eace8efa7esm4148957b3.40.2024.11.07.11.17.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 11:17:23 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v3 1/6] mm: add AS_WRITEBACK_MAY_BLOCK mapping flag Date: Thu, 7 Nov 2024 11:16:12 -0800 Message-ID: <20241107191618.2011146-2-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241107191618.2011146-1-joannelkoong@gmail.com> References: <20241107191618.2011146-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add a new mapping flag AS_WRITEBACK_MAY_BLOCK which filesystems may set to indicate that writeback operations may block or take an indeterminate amount of time to complete. Extra caution should be taken when waiting on writeback for folios belonging to mappings where this flag is set. Signed-off-by: Joanne Koong --- include/linux/pagemap.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 68a5f1ff3301..eb5a7837e142 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -210,6 +210,7 @@ enum mapping_flags { AS_STABLE_WRITES = 7, /* must wait for writeback before modifying folio contents */ AS_INACCESSIBLE = 8, /* Do not attempt direct R/W access to the mapping */ + AS_WRITEBACK_MAY_BLOCK = 9, /* Use caution when waiting on writeback */ /* Bits 16-25 are used for FOLIO_ORDER */ AS_FOLIO_ORDER_BITS = 5, AS_FOLIO_ORDER_MIN = 16, @@ -335,6 +336,16 @@ static inline bool mapping_inaccessible(struct address_space *mapping) return test_bit(AS_INACCESSIBLE, &mapping->flags); } +static inline void mapping_set_writeback_may_block(struct address_space *mapping) +{ + set_bit(AS_WRITEBACK_MAY_BLOCK, &mapping->flags); +} + +static inline bool mapping_writeback_may_block(struct address_space *mapping) +{ + return test_bit(AS_WRITEBACK_MAY_BLOCK, &mapping->flags); +} + static inline gfp_t mapping_gfp_mask(struct address_space * mapping) { return mapping->gfp_mask; From patchwork Thu Nov 7 19:16:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13866960 Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D98C5186E58 for ; Thu, 7 Nov 2024 19:17:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731007047; cv=none; b=srNlJRVCFV5lvoPbUfmz6SBH153aM/lgvqA5IRfixnhoKX6Q2rzxYExpJfYYbyg+KKWIbEzS/7EXYCRvH0hLK6o5ZJ4S74mBxjkqqSIbtYAyrSJn+QLkX1Mbeg2uLUJNY5Wegxte3basuNATh4k7M3oaZFfh2nTS67BkN7PNtiQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731007047; c=relaxed/simple; bh=TNpy65gaPB+tGczJKWkCHgIevd+Hc74bVMSwhGgGHDk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fjy17oG5ohRGfn73BxaOZ2Ec7O1/BI0/CXl0RGl+jgCrxR08twsJuVE34Fl4H3fxLzox+kE5wCW/RehomfpUVN94d1awyXn6wFtkK1Er59XE83gAxvWbnZQ4SbmuULnSwzbpqBLHrE7eHNJgZdtW4nprCXPzMoZeNUWoKaO6tyA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Bh4qo4Qb; arc=none smtp.client-ip=209.85.128.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Bh4qo4Qb" Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-6ea5f68e17aso14392457b3.3 for ; Thu, 07 Nov 2024 11:17:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731007045; x=1731611845; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kGp398/OGKFadKTdBHGJCLRTWw479TJgb4XjwbwuJpk=; b=Bh4qo4QbhFM8tjbGYTgWWIbBm6Al/CQivl9dUzUXQvKttcA00rCsDMrDBn/tVQxshx SfwPr+ttZjAnCzhwWeKx9rXq//PxAc5/s3ieK43DUHZsGNM/LM0aJw7GPUvMKuppfHvz xlosk3F6JeFvY0EnbInFshvVaLXxu79zGkZBwRGw8WhacNw76FS3oTol7Vd+k/jXMJ+l do+++WDiDLw6gCPniZx3LF8bJ6ZH/4dFdMdSdxDUS7vVMWq1IVfnmnrrHh+6+hxM8MF1 i3ndqsBVrQfQUw0eOB0gT8heRGyc9n3yvtOLO+mdGV+oQW1V3kZVhCI0Q9C7aSqpt2A4 Q4bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731007045; x=1731611845; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kGp398/OGKFadKTdBHGJCLRTWw479TJgb4XjwbwuJpk=; b=Zo3SLXCT1eNCqg+uLIRFfiIaaGLripce+CmzGErvZXXddGPIq9ZP+kVNAhV7SI0tqE Wr52MfWPvCMPxD22kYAFi9OQycc73+YnMhvsji8uat6fcJGTqCWwTPX85QSTkTxJiiyu pW4YiE+EyeEwObIX3gfvJgXp51hOoksCnecw656QaClqNm9Mc141QmheF3T6MmlJQSQn l14nIInFRHoZo4XtOQnSV0gEZMC340NGsYeIElWFQ3xOXNsHraAMTjzoq6O0/IdUiYa1 iNwoQUqqrrhctECJD2W6O3xL45eMqdqlwYevKL3LEhTS0vgAlUlp3NiqD0haQwhc7oY6 mk+A== X-Forwarded-Encrypted: i=1; AJvYcCWQulQbOKsoQyfBIMVI+/ZoxVimgODVsOq1Hr/aSzrKm7s8Tm0jnI/icTk5yDRfy3xNYiwTXxwpIrOuL23e@vger.kernel.org X-Gm-Message-State: AOJu0Yz+xTudojPHyat3j9KVwRvf0chNxWVYzf6+OZFGPcHZhQOwND/0 I2Y5YvrIntb7raRrOHrkzc0qTOyMrQiUtPhf0Dz3v5i1rtgxBLH5 X-Google-Smtp-Source: AGHT+IH7Um593vs1SsckKVS39lYFZVnOlvrS2bG7U8disolXAeCbKEgSktqlpWFYBAXGxiiFp/wc6A== X-Received: by 2002:a05:690c:c96:b0:6ea:85ee:b5d4 with SMTP id 00721157ae682-6eaddd86d12mr1008527b3.6.1731007044813; Thu, 07 Nov 2024 11:17:24 -0800 (PST) Received: from localhost (fwdproxy-nha-112.fbsv.net. [2a03:2880:25ff:70::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6eaceb65d6csm4034397b3.72.2024.11.07.11.17.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 11:17:24 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v3 2/6] mm: skip reclaiming folios in legacy memcg writeback contexts that may block Date: Thu, 7 Nov 2024 11:16:13 -0800 Message-ID: <20241107191618.2011146-3-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241107191618.2011146-1-joannelkoong@gmail.com> References: <20241107191618.2011146-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Currently in shrink_folio_list(), reclaim for folios under writeback falls into 3 different cases: 1) Reclaim is encountering an excessive number of folios under writeback and this folio has both the writeback and reclaim flags set 2) Dirty throttling is enabled (this happens if reclaim through cgroup is not enabled, if reclaim through cgroupv2 memcg is enabled, or if reclaim is on the root cgroup), or if the folio is not marked for immediate reclaim, or if the caller does not have __GFP_FS (or __GFP_IO if it's going to swap) set 3) Legacy cgroupv1 encounters a folio that already has the reclaim flag set and the caller did not have __GFP_FS (or __GFP_IO if swap) set In cases 1) and 2), we activate the folio and skip reclaiming it while in case 3), we wait for writeback to finish on the folio and then try to reclaim the folio again. In case 3, we wait on writeback because cgroupv1 does not have dirty folio throttling, as such this is a mitigation against the case where there are too many folios in writeback with nothing else to reclaim. The issue is that for filesystems where writeback may block, sub-optimal workarounds may need to be put in place to avoid a potential deadlock that may arise from reclaim waiting on writeback. (Even though case 3 above is rare given that legacy cgroupv1 is on its way to being deprecated, this case still needs to be accounted for). For example, for FUSE filesystems, a temp page gets allocated per dirty page and the contents of the dirty page are copied over to the temp page so that writeback can be immediately cleared on the dirty page in order to avoid the following deadlock: * single-threaded FUSE server is in the middle of handling a request that needs a memory allocation * memory allocation triggers direct reclaim * direct reclaim waits on a folio under writeback (eg falls into case 3 above) that needs to be written back to the FUSE server * the FUSE server can't write back the folio since it's stuck in direct reclaim In this commit, if legacy memcg encounters a folio with the reclaim flag set (eg case 3) and the folio belongs to a mapping that has the AS_WRITEBACK_MAY_BLOCK flag set, the folio will be activated and skip reclaim (eg default to behavior in case 2) instead. This allows for the suboptimal workarounds added to address the "reclaim wait on writeback" deadlock scenario to be removed. Signed-off-by: Joanne Koong --- mm/vmscan.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 749cdc110c74..e9755cb7211b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1110,6 +1110,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (writeback && folio_test_reclaim(folio)) stat->nr_congested += nr_pages; + mapping = folio_mapping(folio); + /* * If a folio at the tail of the LRU is under writeback, there * are three cases to consider. @@ -1129,8 +1131,9 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, * 2) Global or new memcg reclaim encounters a folio that is * not marked for immediate reclaim, or the caller does not * have __GFP_FS (or __GFP_IO if it's simply going to swap, - * not to fs). In this case mark the folio for immediate - * reclaim and continue scanning. + * not to fs), or writebacks in the mapping may block. + * In this case mark the folio for immediate reclaim and + * continue scanning. * * Require may_enter_fs() because we would wait on fs, which * may not have submitted I/O yet. And the loop driver might @@ -1165,7 +1168,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, /* Case 2 above */ } else if (writeback_throttling_sane(sc) || !folio_test_reclaim(folio) || - !may_enter_fs(folio, sc->gfp_mask)) { + !may_enter_fs(folio, sc->gfp_mask) || + (mapping && mapping_writeback_may_block(mapping))) { /* * This is slightly racy - * folio_end_writeback() might have From patchwork Thu Nov 7 19:16:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13866961 Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76D22217659 for ; Thu, 7 Nov 2024 19:17:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731007050; cv=none; b=X9HfcZhl1oAaChDZTyS67qGwkzl3FaF9kW/jCuryRDz3vB+gjDFrbB7bisQ50nXDdPy9GwBDe2Kq+5zAE61j0ZvbAHjEqaOq8spM75wBUlEvDfJ4szHxtpV5H5qMv9QVbrZsMN8MmOc1tkc6vjmCOco7Cu722Scdx1VynjxFn+c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731007050; c=relaxed/simple; bh=Vyk5LdXUPCJ5WnYil+TLoGEb7TvxgkoZNZe/xBB3dWg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Vve8zfP1WpWBuQbWe0JGYryZhchXS3fvRDyYz9dFSh2RJWEChVrqhoj5l4PGahmJ54UvJwzmzaq4JbDOif2ceGX0fb2pnKrM3TKsKUSemr3AS9fylOXFGDE/vXIpHJoyhmFdy/Uw8SQmlDaLkE/zlb6OgvGcHdxePmmCjtSAgOs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=cfyW+4Df; arc=none smtp.client-ip=209.85.128.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cfyW+4Df" Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-6e35bf59cf6so23781197b3.0 for ; Thu, 07 Nov 2024 11:17:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731007047; x=1731611847; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZnCvLTbTmq6xS6wpFSTKmY1gePcY6efLtCHlkRWy6SY=; b=cfyW+4DfXI6URka2fb/UJuptnM1AmhT3r8D/IN+Tjjh24syMdl3+CaNJt+1xEOas2l os+aS4yUY7af+XV9ilgbiDcIe1Ji1UpFm3GK5HKfrr8XhomgN0uJLQ39GZi/nYUPAL3n IXNQZYRyZ3YqQMLX3mt1S/v+piNU3YIpOxRRPcMroQ92DUI9Cv/o6+muKHNNlF8CAeRZ t/5wk5L/c5nYz76TI7wGs4Chfwogvne+QBQjPeb4SyTnezW3dSI5nvXGCwwWQxeDij32 Dy990YH6sfILg7KuQZAxjz6yzhJqyEgHdFDa55I4+gGPZRPlsBiCRT7D/O+OQcUo9tlu tILA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731007047; x=1731611847; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZnCvLTbTmq6xS6wpFSTKmY1gePcY6efLtCHlkRWy6SY=; b=JHLARGVivx6mLfAL+22NLtum9kXdQiVeVsYfGyYSlGIbE0WM1QBC60lsivQv8aARjl prwYtGOPPwO/xMFmx2jCgv3B+TVZo5NV6olYgAkcW3IdR+v09EzWp2uUeiLa+bpvxTT8 /+ccvuw/V3QQyGUXBw4O7OlfspXELESi3S/V+ib4Tvoqi7nJaxV/SxbRXpo4g4Az/Go1 cCpHACzFtKwtF7XTMj9EdmqyyfF3uAQaV2Lv4nAUDyYBupTb3Ug9novouj7tU6bal54E GWFFnGn/ozfeonTjQuKttFiJrbXwrQWjtrEUt8lpveVvfdY31RvYlrzS5FUHXgsihAW2 tmvg== X-Forwarded-Encrypted: i=1; AJvYcCWr18fuL64LXkbLq3yI5uKSY3FuYHU+LMA1Qn8iiia4P36y3v4LuD/EqWEl9QloTcHJKKJdtGb2x9mNU6+r@vger.kernel.org X-Gm-Message-State: AOJu0YzBN1w8/CQSqMSYGvsn+Zw8/0/Srp6M0BGmst2MZxtl4UKYjGyc aM7qd2mLDg4c1b+aloQyVT7+ghILsUX3G7Wu69NNYZL6fHd2YwJC X-Google-Smtp-Source: AGHT+IFHLm1F5SVScEjFhzgLOTEvXi9gcszMcWHxca2UHg5tDjnu1mcg7x+SjCqmp89fKz//T9WiJA== X-Received: by 2002:a05:690c:4804:b0:6e9:e097:e9d2 with SMTP id 00721157ae682-6eadc0daba4mr7442157b3.6.1731007045976; Thu, 07 Nov 2024 11:17:25 -0800 (PST) Received: from localhost (fwdproxy-nha-008.fbsv.net. [2a03:2880:25ff:8::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6eaceb65f80sm3999417b3.91.2024.11.07.11.17.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 11:17:25 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v3 3/6] fs/writeback: in wait_sb_inodes(), skip wait for AS_WRITEBACK_MAY_BLOCK mappings Date: Thu, 7 Nov 2024 11:16:14 -0800 Message-ID: <20241107191618.2011146-4-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241107191618.2011146-1-joannelkoong@gmail.com> References: <20241107191618.2011146-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 For filesystems with the AS_WRITEBACK_MAY_BLOCK flag set, writeback operations may block or take an indeterminate time to complete. For example, writing data back to disk in FUSE filesystems depends on the userspace server successfully completing writeback. In this commit, wait_sb_inodes() skips waiting on writeback if the inode's mapping has AS_WRITEBACK_MAY_BLOCK set, else sync(2) may take an indeterminate amount of time to complete. If the caller wishes to ensure the data for a mapping with the AS_WRITEBACK_MAY_BLOCK flag set has actually been written back to disk, they should use fsync(2)/fdatasync(2) instead. Signed-off-by: Joanne Koong --- fs/fs-writeback.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index d8bec3c1bb1f..c80c45972162 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2659,6 +2659,9 @@ static void wait_sb_inodes(struct super_block *sb) if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) continue; + if (mapping_writeback_may_block(mapping)) + continue; + spin_unlock_irq(&sb->s_inode_wblist_lock); spin_lock(&inode->i_lock); From patchwork Thu Nov 7 19:16:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13866962 Received: from mail-yb1-f181.google.com (mail-yb1-f181.google.com [209.85.219.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E21A21733F for ; Thu, 7 Nov 2024 19:17:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731007050; cv=none; b=YJXfsbGiN+tZgvzSXT8k/P4NlOWFa2KbNJftUiG1zAuUo16O79Iq9DPRudVZfn49CyFr3DKsIrRe2Y64+ZOmorGgJ5gQtuFQWoivcFyk8LA15ftrjHG59gZ0siCH9SKtJ6cQE1g3xJXtsDhiP8Owoy9Qb4I295gOjINXVyqZ7R4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731007050; c=relaxed/simple; bh=3cbO7y0pACrtoUjmXCfJYrCv/UHydIb1Y40En1V5hgE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Kfd3BExdE7MrqSWx2R29jke92EFduEpad1vevYS7WhGk8v66ZUI25dB8FJBaz/LcZ3Zea3xC51qRDgi0FfRYUn7ubyQWG8PRhfaG8tyiA5gt4MaCzKKz2KECwK3P5Sj9HwBdgnJ0T7YtsKw1OAYUTNQK61W/ye7hDHUM1avxPnY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GIS1xVLe; arc=none smtp.client-ip=209.85.219.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GIS1xVLe" Received: by mail-yb1-f181.google.com with SMTP id 3f1490d57ef6-e29047bec8fso2003004276.0 for ; Thu, 07 Nov 2024 11:17:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731007047; x=1731611847; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=u2CBAn6giMXXiKhPOgLeKDCo9Vrjd3aYf+IrGQBfRgc=; b=GIS1xVLeyBiwtx6UAZivgi7qFs4hnTsrPdcRV3pyk3ouKum+7goLQXd6jhOraxnxCN u0l06dUnbQVYoHW3ZSBzvvqSI0Veo1FL5PjBbLP9awn93cNUuONSvVBJQd/txtDiltmz hASsU8Kmmf8wT/BiDHurSvcgxBMb5hFXp2OpsoOVfXT++Qpcp+QggK/A0Av5gEq3TpR6 H+PwIW54E+rZUxLbwwSObDxEDIXHMReIJSsksr21MB6iWWLzWCbrHJwIvxtNP/df3m71 peS+SUE27cdYLMjAEPRK24FTZVKPKUkMH1SdU1jILpWSFGzcDVoKJdiCA0JIJV14lXHL cqXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731007047; x=1731611847; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u2CBAn6giMXXiKhPOgLeKDCo9Vrjd3aYf+IrGQBfRgc=; b=R/yysLbzh7pc2vi3qnH8a+tu/KAxzcvbMJwxD+/+mvDTLLvguiTH6Nny6ilejTf2cT m/f+KqTIpSekLOWa+EZLUflX6A0d1CgtMMv8yQyiU3GVwRhqfNEQp4G6IdnolW4eQJad QKK7C6dKXpPYBZYizC6VASljK7dVkZi7S4YxKp3YNyUPhUDwBLFV+vFfjk+J5Y/km1QH TGjcf9ikPC3A7UhsvmDF2Tilg46p0lp4t3IqK8/Bt6pTcLEEi+jfeuVPZ6bMLQVkLi1B lPGTuSULx4XUYrgfajDLcpkf3U/2BWiYNZC+JE5UHZAOUCBmVnxPKygCnrJN3QWbsz0f lkdA== X-Forwarded-Encrypted: i=1; AJvYcCWTIsbkDTF9VhAKc25tjlyYwsOWFsM+61b9fVtV1hH9NyzgJDTCKlNXCq3VGRnkXQNoYcxdPoDrOVfkC84D@vger.kernel.org X-Gm-Message-State: AOJu0YxPPP8Z28lHRaeIlsFFefpo212/VbxPSdNMmKOQj87Ga5ZNSUrC IfWDAsOtlWzyVdQYUGDBeYT+ssu8T74h4da8zvTDeGgtLeUSCisg X-Google-Smtp-Source: AGHT+IEkpWzZirMthRa1KIppd+IoM2B6ZQrYg1W/sBImn83Wn53xX/nxsQKorMCpOKraG2RVdoxyYg== X-Received: by 2002:a05:690c:fd2:b0:6db:e1e0:bf6a with SMTP id 00721157ae682-6eadc0a0bbemr9007977b3.7.1731007047220; Thu, 07 Nov 2024 11:17:27 -0800 (PST) Received: from localhost (fwdproxy-nha-008.fbsv.net. [2a03:2880:25ff:8::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6eace8f1c74sm4067837b3.31.2024.11.07.11.17.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 11:17:27 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v3 4/6] mm/memory-hotplug: add finite retries in offline_pages() if migration fails Date: Thu, 7 Nov 2024 11:16:15 -0800 Message-ID: <20241107191618.2011146-5-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241107191618.2011146-1-joannelkoong@gmail.com> References: <20241107191618.2011146-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In offline_pages(), do_migrate_range() may potentially retry forever if the migration fails. Add a return value for do_migrate_range(), and allow offline_page() to try migrating pages 5 times before erroring out, similar to how migration failures in __alloc_contig_migrate_range() is handled. Signed-off-by: Joanne Koong --- mm/memory_hotplug.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 621ae1015106..49402442ea3b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1770,13 +1770,14 @@ static int scan_movable_pages(unsigned long start, unsigned long end, return 0; } -static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) +static int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) { struct folio *folio; unsigned long pfn; LIST_HEAD(source); static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); + int ret = 0; for (pfn = start_pfn; pfn < end_pfn; pfn++) { struct page *page; @@ -1833,7 +1834,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL, .reason = MR_MEMORY_HOTPLUG, }; - int ret; /* * We have checked that migration range is on a single zone so @@ -1863,6 +1863,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) putback_movable_pages(&source); } } + return ret; } static int __init cmdline_parse_movable_node(char *p) @@ -1940,6 +1941,7 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, const int node = zone_to_nid(zone); unsigned long flags; struct memory_notify arg; + unsigned int tries = 0; char *reason; int ret; @@ -2028,11 +2030,8 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, ret = scan_movable_pages(pfn, end_pfn, &pfn); if (!ret) { - /* - * TODO: fatal migration failures should bail - * out - */ - do_migrate_range(pfn, end_pfn); + if (do_migrate_range(pfn, end_pfn) && ++tries == 5) + ret = -EBUSY; } } while (!ret); From patchwork Thu Nov 7 19:16:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13866963 Received: from mail-yb1-f173.google.com (mail-yb1-f173.google.com [209.85.219.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0F2821733B for ; Thu, 7 Nov 2024 19:17:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731007051; cv=none; b=K8xMu1knQOfhjurfmwZXeDFVOFJxkCzWxwjC2koLc4cgtVAlJqUpx61VyoXwhyiYvpwcYzJ1rZhv2HK+KvRtucUtgTNeP+EHDKRonhD7RqHpUGDHHf6v/YeVIaAMpH4YOVF+yXMSM8EDCf1i/vBzKdDsj4H+nBqdnZkgFnR3R7A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731007051; c=relaxed/simple; bh=0ISNyka0c8g3ggGAopkqyoWrgbOmnfjf6wXkVtDXPS8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gukL3t4HeotVEB1cZke3tIkYH9SWAKibwiEQymqSU3LHVxJv5MjLmnV7K/0V9AMEcnPXSNXkXfymbHTimAGWyyYjwBljLisvfELA6ozmyfYG5Ye2IiMtgYpJnemzcyXO41wbRCQSQJ4WBuv1x9WVeBZ+/32qNPhRPmgIe92dBd4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=D606zg3P; arc=none smtp.client-ip=209.85.219.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="D606zg3P" Received: by mail-yb1-f173.google.com with SMTP id 3f1490d57ef6-e28fe3b02ffso1290937276.3 for ; Thu, 07 Nov 2024 11:17:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731007048; x=1731611848; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MG5iK0PNmiW0m0S9ll/rGoWwCANqU1Z5kMah1Ao73Dk=; b=D606zg3PMNbDqUZ19QIIdo4OL9a9h8zktab9J5Sg2a45kGKeVJtLyL5VRI9vSpI9+K 9cbh3DIiLXfeJyIAfngV8sBE9BmL/bvM4LbdABIw3JAmESrPKqEM7LEKfRx99UH34o13 B3XZYwE3pbNhkVoACsxzdaIFnqP/Pzo5bYfmZm1q24sss4herVscMQJTPPAxjPKOnhSX dRxMOl97JzdBhY4j7MqTiVlGCBoBBnHtHhzkPiZHuDmYR7c/HbXlKJwZ9w2omxnFra85 XSzgD3rXFliN92dpuk/3IRRINkZyvHZSotAVR8UOYUMx9eM19cKYjeOACFyJ6lKc9dQn s9JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731007048; x=1731611848; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MG5iK0PNmiW0m0S9ll/rGoWwCANqU1Z5kMah1Ao73Dk=; b=HA7SSXQtdlArnYREHxZmVZGLRo4N8KpYIHoPOrqr6cgBJSaEehUcjSuBNKXZPR2oiC qUzfopvfYars61ZJX87p0T82FeklYxKxZ/qtpZB4E/0kRhSwIxq4yYNzw+sIgi3lwQB/ ISlfexBHkRHUj+2Qm/r6Jajmk3NsMiQZ9v6L/XVAxu7aw463Zo8sRw/lDKZkF8SuCIY3 iyLKOg9gRTI0gD0mh/VFSqYRKR9N+38T10+bgkIQoa2f0E307OW8Ov3AWMQZ1wlVyg+n LMqBRf4Mjk/rpivucNt0XcNUC+NEewvfmZfCoUghmT/SjNINMybuBPT7IER+hDTHH1Yi 6Nvg== X-Forwarded-Encrypted: i=1; AJvYcCX633MMxuHDotHkZrgBjR5V4/z72ZDt58ZdRyNIUcueF9G+nGaLoWd0uPYs4/QXqnsw4UFlkrnsCiUJAAvs@vger.kernel.org X-Gm-Message-State: AOJu0Yw4xpyTOHoIxqxCLhqf6f6SqsygL04+QkQDf64ywBT1N/g986P6 s9tL/33aRS1HCqss8FL8ARzRUGnmuOmxU/GpB8yRot4eLMJzJQCg X-Google-Smtp-Source: AGHT+IFcIlGFpbiDZRAZoKdELB0Ba2H780dbAB8bPQD2I1i+kogNUBV3Xb4R3gifXVFOfN141DB5Gg== X-Received: by 2002:a05:6902:f84:b0:e29:948:69cc with SMTP id 3f1490d57ef6-e337f844097mr171360276.6.1731007048555; Thu, 07 Nov 2024 11:17:28 -0800 (PST) Received: from localhost (fwdproxy-nha-013.fbsv.net. [2a03:2880:25ff:d::face:b00c]) by smtp.gmail.com with ESMTPSA id 3f1490d57ef6-e336f1ba8e5sm386281276.43.2024.11.07.11.17.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 11:17:28 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v3 5/6] mm/migrate: fail MIGRATE_SYNC for folios under writeback with AS_WRITEBACK_MAY_BLOCK mappings Date: Thu, 7 Nov 2024 11:16:16 -0800 Message-ID: <20241107191618.2011146-6-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241107191618.2011146-1-joannelkoong@gmail.com> References: <20241107191618.2011146-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 For folios with mappings that have the AS_WRITEBACK_MAY_BLOCK flag set on it, fail MIGRATE_SYNC mode migration with -EBUSY if the folio is currently under writeback. If the AS_WRITEBACK_MAY_BLOCK flag is set on the mapping, the writeback may take an indeterminate amount of time to complete, so we cannot wait on writeback. Signed-off-by: Joanne Koong --- mm/migrate.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index df91248755e4..1d038a4202ae 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1260,7 +1260,10 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, */ switch (mode) { case MIGRATE_SYNC: - break; + if (!src->mapping || + !mapping_writeback_may_block(src->mapping)) + break; + fallthrough; default: rc = -EBUSY; goto out; From patchwork Thu Nov 7 19:16:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13866965 Received: from mail-yb1-f174.google.com (mail-yb1-f174.google.com [209.85.219.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B054F195B1A for ; Thu, 7 Nov 2024 19:17:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731007068; cv=none; b=rVfiGWl9jtPtfDXbTHWlAccCUpBrBQGgRCYRkeg2IBMe+zF1B8J/5gxGeEcZh9hZOEZ6F2t63R9xux9pSmDFup6+1/nunCvlZHX6RtpCDNPK5y2OfXw+tnSXsCa7vwO7MU9G8zSuvmNDjsj0xAvo2UMQsDDsFe/KZ/dL1xmj0XM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731007068; c=relaxed/simple; bh=876u/r1RmBaMQribv4kOOKSU/WZ119YhdQv0UfQy+y8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fPGotEulvT7/HlKjF1YIszXSoPuECoXaDuH8FJU5/69U+qpatI8VnNmvVdfEUhGjGuIxIrkHd6EFjuAJcPfDXmWEaPjZMp2ENLCEh1JNwoY/FI/K6ajHJiAwW49+QZiXvB6YPKC+4vHcFjV+Por96v5IDa72tw5MfG+V23WStQs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FBo3GB6J; arc=none smtp.client-ip=209.85.219.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FBo3GB6J" Received: by mail-yb1-f174.google.com with SMTP id 3f1490d57ef6-e2918664a3fso1280444276.0 for ; Thu, 07 Nov 2024 11:17:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731007065; x=1731611865; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hd9fc4hPdhRnHAoBq+4wrx1ZclXDUb5MhN21NFaoAlY=; b=FBo3GB6JlS/dH0jfLM1aDCcVjWP48yckIQFxTbt4Ljn+e9puI8zK2/vzrEAcbfdu/I zqUTYBZF4vySX6SgVKe5aAxTcT+dpFgVkyja/ql7YokeoFmJlbncSNf1cUK0aHx6MAP3 d9zwlK8KqqkOoX1uf3VrfxafKWXMQtqVzjfI/HRjByHGipwM0LXkb41Ao2mtnmZ4IcbS qKy3pZz1LTcbZRbedzhCyP8lPnKQS+RtvA0Xm1v4PDdgnS+dSCuHcrWLtHwFMSDuCw4W tQ5wG4fNV60VTAac/b9gupAkSWHYd3AehhKIBuXJg3niRueekC1DgTh86KlTCWEDQdD5 2yEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731007065; x=1731611865; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hd9fc4hPdhRnHAoBq+4wrx1ZclXDUb5MhN21NFaoAlY=; b=igbPm4G/+Pe/omeoxL9caCSQriRH3soqD3AIoHYufJPJVfCSwvQUqD9oQ9XzHPwlSe p2SD7lVTsQo7ZkaqqZczCbXAyDJoCw4TyGZEbAXrI6Vr64jVttG158ePxnq///SGjjiD kQlVomxq6OP4MNWPR4NMBraAIaSkm+o0P/VAQiyniKO+B7FvGMZt+mfy2TtMnZHiYGue iI0aAfTbKEA+kd6aF97t2l5zUq+W/AF2ZPtcn+jKAFoTP8E3mCVUXXiO/ndZbtoixXCP BwsSMcBWUOJzd8649Vv0MslbLH5q1DfmBqX7EqKcd1AOreXy+FdUcLPFZ48iaXZsJEh9 YDBw== X-Forwarded-Encrypted: i=1; AJvYcCUKKKSS0NmLqEm7tYpdGjbgQ9lfjileulDccOyKe9C63GRVK4GikVmCob7rsr0qLvsy70iaN9mEaFrrZF1m@vger.kernel.org X-Gm-Message-State: AOJu0Ywuf4NEj0bIRi+mHMjBeaxELKBa8ZZCDgxzjg3E3m7x+fd6y/rF xsllPPHWnmiLxroUG9H3CL7kNk1QPkIC+v4tx0GEAL00c3flBIm4 X-Google-Smtp-Source: AGHT+IHzjE4uzit81wgdKLtfKGm9hnxE8pRULcxB9Svk1Y0PsQoAtIjJZ+dkV7yShUmT0GvecKkkSw== X-Received: by 2002:a05:6902:1081:b0:e2b:dc72:3bdf with SMTP id 3f1490d57ef6-e337e11f26emr729748276.10.1731007065407; Thu, 07 Nov 2024 11:17:45 -0800 (PST) Received: from localhost (fwdproxy-nha-012.fbsv.net. [2a03:2880:25ff:c::face:b00c]) by smtp.gmail.com with ESMTPSA id 3f1490d57ef6-e336f1baaefsm369181276.44.2024.11.07.11.17.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 11:17:45 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v3 6/6] fuse: remove tmp folio for writebacks and internal rb tree Date: Thu, 7 Nov 2024 11:16:18 -0800 Message-ID: <20241107191618.2011146-8-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241107191618.2011146-1-joannelkoong@gmail.com> References: <20241107191618.2011146-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Currently, we allocate and copy data to a temporary folio when handling writeback in order to mitigate the following deadlock scenario that may arise if reclaim waits on writeback to complete: * single-threaded FUSE server is in the middle of handling a request that needs a memory allocation * memory allocation triggers direct reclaim * direct reclaim waits on a folio under writeback * the FUSE server can't write back the folio since it's stuck in direct reclaim To work around this, we allocate a temporary folio and copy over the original folio to the temporary folio so that writeback can be immediately cleared on the original folio. This additionally requires us to maintain an internal rb tree to keep track of writeback state on the temporary folios. A recent change prevents reclaim logic from waiting on writeback for folios whose mappings have the AS_WRITEBACK_MAY_BLOCK flag set in it. This commit sets AS_WRITEBACK_MAY_BLOCK on FUSE inode mappings (which will prevent FUSE folios from running into the reclaim deadlock described above) and removes the temporary folio + extra copying and the internal rb tree. fio benchmarks -- (using averages observed from 10 runs, throwing away outliers) Setup: sudo mount -t tmpfs -o size=30G tmpfs ~/tmp_mount ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=4 -o source=~/tmp_mount ~/fuse_mount fio --name=writeback --ioengine=sync --rw=write --bs={1k,4k,1M} --size=2G --numjobs=2 --ramp_time=30 --group_reporting=1 --directory=/root/fuse_mount bs = 1k 4k 1M Before 351 MiB/s 1818 MiB/s 1851 MiB/s After 341 MiB/s 2246 MiB/s 2685 MiB/s % diff -3% 23% 45% Signed-off-by: Joanne Koong --- fs/fuse/file.c | 339 +++++-------------------------------------------- 1 file changed, 30 insertions(+), 309 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 88d0946b5bc9..a2e91fdd8521 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -415,89 +415,11 @@ u64 fuse_lock_owner_id(struct fuse_conn *fc, fl_owner_t id) struct fuse_writepage_args { struct fuse_io_args ia; - struct rb_node writepages_entry; struct list_head queue_entry; - struct fuse_writepage_args *next; struct inode *inode; struct fuse_sync_bucket *bucket; }; -static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi, - pgoff_t idx_from, pgoff_t idx_to) -{ - struct rb_node *n; - - n = fi->writepages.rb_node; - - while (n) { - struct fuse_writepage_args *wpa; - pgoff_t curr_index; - - wpa = rb_entry(n, struct fuse_writepage_args, writepages_entry); - WARN_ON(get_fuse_inode(wpa->inode) != fi); - curr_index = wpa->ia.write.in.offset >> PAGE_SHIFT; - if (idx_from >= curr_index + wpa->ia.ap.num_folios) - n = n->rb_right; - else if (idx_to < curr_index) - n = n->rb_left; - else - return wpa; - } - return NULL; -} - -/* - * Check if any page in a range is under writeback - */ -static bool fuse_range_is_writeback(struct inode *inode, pgoff_t idx_from, - pgoff_t idx_to) -{ - struct fuse_inode *fi = get_fuse_inode(inode); - bool found; - - if (RB_EMPTY_ROOT(&fi->writepages)) - return false; - - spin_lock(&fi->lock); - found = fuse_find_writeback(fi, idx_from, idx_to); - spin_unlock(&fi->lock); - - return found; -} - -static inline bool fuse_page_is_writeback(struct inode *inode, pgoff_t index) -{ - return fuse_range_is_writeback(inode, index, index); -} - -/* - * Wait for page writeback to be completed. - * - * Since fuse doesn't rely on the VM writeback tracking, this has to - * use some other means. - */ -static void fuse_wait_on_page_writeback(struct inode *inode, pgoff_t index) -{ - struct fuse_inode *fi = get_fuse_inode(inode); - - wait_event(fi->page_waitq, !fuse_page_is_writeback(inode, index)); -} - -static inline bool fuse_folio_is_writeback(struct inode *inode, - struct folio *folio) -{ - pgoff_t last = folio_next_index(folio) - 1; - return fuse_range_is_writeback(inode, folio_index(folio), last); -} - -static void fuse_wait_on_folio_writeback(struct inode *inode, - struct folio *folio) -{ - struct fuse_inode *fi = get_fuse_inode(inode); - - wait_event(fi->page_waitq, !fuse_folio_is_writeback(inode, folio)); -} - /* * Wait for all pending writepages on the inode to finish. * @@ -891,7 +813,7 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio) * have writeback that extends beyond the lifetime of the folio. So * make sure we read a properly synced folio. */ - fuse_wait_on_folio_writeback(inode, folio); + folio_wait_writeback(folio); attr_ver = fuse_get_attr_version(fm->fc); @@ -1006,13 +928,14 @@ static void fuse_readahead(struct readahead_control *rac) struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_conn *fc = get_fuse_conn(inode); unsigned int max_pages, nr_pages; - pgoff_t first = readahead_index(rac); - pgoff_t last = first + readahead_count(rac) - 1; + loff_t first = readahead_pos(rac); + loff_t last = first + readahead_length(rac) - 1; if (fuse_is_bad(inode)) return; - wait_event(fi->page_waitq, !fuse_range_is_writeback(inode, first, last)); + wait_event(fi->page_waitq, + !filemap_range_has_writeback(rac->mapping, first, last)); max_pages = min_t(unsigned int, fc->max_pages, fc->max_read / PAGE_SIZE); @@ -1172,7 +1095,7 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia, int err; for (i = 0; i < ap->num_folios; i++) - fuse_wait_on_folio_writeback(inode, ap->folios[i]); + folio_wait_writeback(ap->folios[i]); fuse_write_args_fill(ia, ff, pos, count); ia->write.in.flags = fuse_write_flags(iocb); @@ -1622,7 +1545,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, return res; } } - if (!cuse && fuse_range_is_writeback(inode, idx_from, idx_to)) { + if (!cuse && filemap_range_has_writeback(mapping, pos, (pos + count - 1))) { if (!write) inode_lock(inode); fuse_sync_writes(inode); @@ -1824,8 +1747,10 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) if (wpa->bucket) fuse_sync_bucket_dec(wpa->bucket); - for (i = 0; i < ap->num_folios; i++) + for (i = 0; i < ap->num_folios; i++) { + folio_end_writeback(ap->folios[i]); folio_put(ap->folios[i]); + } fuse_file_put(wpa->ia.ff, false); @@ -1838,7 +1763,7 @@ static void fuse_writepage_finish_stat(struct inode *inode, struct folio *folio) struct backing_dev_info *bdi = inode_to_bdi(inode); dec_wb_stat(&bdi->wb, WB_WRITEBACK); - node_stat_sub_folio(folio, NR_WRITEBACK_TEMP); + node_stat_sub_folio(folio, NR_WRITEBACK); wb_writeout_inc(&bdi->wb); } @@ -1861,7 +1786,6 @@ static void fuse_send_writepage(struct fuse_mount *fm, __releases(fi->lock) __acquires(fi->lock) { - struct fuse_writepage_args *aux, *next; struct fuse_inode *fi = get_fuse_inode(wpa->inode); struct fuse_write_in *inarg = &wpa->ia.write.in; struct fuse_args *args = &wpa->ia.ap.args; @@ -1898,19 +1822,8 @@ __acquires(fi->lock) out_free: fi->writectr--; - rb_erase(&wpa->writepages_entry, &fi->writepages); fuse_writepage_finish(wpa); spin_unlock(&fi->lock); - - /* After rb_erase() aux request list is private */ - for (aux = wpa->next; aux; aux = next) { - next = aux->next; - aux->next = NULL; - fuse_writepage_finish_stat(aux->inode, - aux->ia.ap.folios[0]); - fuse_writepage_free(aux); - } - fuse_writepage_free(wpa); spin_lock(&fi->lock); } @@ -1938,43 +1851,6 @@ __acquires(fi->lock) } } -static struct fuse_writepage_args *fuse_insert_writeback(struct rb_root *root, - struct fuse_writepage_args *wpa) -{ - pgoff_t idx_from = wpa->ia.write.in.offset >> PAGE_SHIFT; - pgoff_t idx_to = idx_from + wpa->ia.ap.num_folios - 1; - struct rb_node **p = &root->rb_node; - struct rb_node *parent = NULL; - - WARN_ON(!wpa->ia.ap.num_folios); - while (*p) { - struct fuse_writepage_args *curr; - pgoff_t curr_index; - - parent = *p; - curr = rb_entry(parent, struct fuse_writepage_args, - writepages_entry); - WARN_ON(curr->inode != wpa->inode); - curr_index = curr->ia.write.in.offset >> PAGE_SHIFT; - - if (idx_from >= curr_index + curr->ia.ap.num_folios) - p = &(*p)->rb_right; - else if (idx_to < curr_index) - p = &(*p)->rb_left; - else - return curr; - } - - rb_link_node(&wpa->writepages_entry, parent, p); - rb_insert_color(&wpa->writepages_entry, root); - return NULL; -} - -static void tree_insert(struct rb_root *root, struct fuse_writepage_args *wpa) -{ - WARN_ON(fuse_insert_writeback(root, wpa)); -} - static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args *args, int error) { @@ -1994,41 +1870,6 @@ static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args *args, if (!fc->writeback_cache) fuse_invalidate_attr_mask(inode, FUSE_STATX_MODIFY); spin_lock(&fi->lock); - rb_erase(&wpa->writepages_entry, &fi->writepages); - while (wpa->next) { - struct fuse_mount *fm = get_fuse_mount(inode); - struct fuse_write_in *inarg = &wpa->ia.write.in; - struct fuse_writepage_args *next = wpa->next; - - wpa->next = next->next; - next->next = NULL; - tree_insert(&fi->writepages, next); - - /* - * Skip fuse_flush_writepages() to make it easy to crop requests - * based on primary request size. - * - * 1st case (trivial): there are no concurrent activities using - * fuse_set/release_nowrite. Then we're on safe side because - * fuse_flush_writepages() would call fuse_send_writepage() - * anyway. - * - * 2nd case: someone called fuse_set_nowrite and it is waiting - * now for completion of all in-flight requests. This happens - * rarely and no more than once per page, so this should be - * okay. - * - * 3rd case: someone (e.g. fuse_do_setattr()) is in the middle - * of fuse_set_nowrite..fuse_release_nowrite section. The fact - * that fuse_set_nowrite returned implies that all in-flight - * requests were completed along with all of their secondary - * requests. Further primary requests are blocked by negative - * writectr. Hence there cannot be any in-flight requests and - * no invocations of fuse_writepage_end() while we're in - * fuse_set_nowrite..fuse_release_nowrite section. - */ - fuse_send_writepage(fm, next, inarg->offset + inarg->size); - } fi->writectr--; fuse_writepage_finish(wpa); spin_unlock(&fi->lock); @@ -2115,19 +1956,18 @@ static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, } static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struct folio *folio, - struct folio *tmp_folio, uint32_t folio_index) + uint32_t folio_index) { struct inode *inode = folio->mapping->host; struct fuse_args_pages *ap = &wpa->ia.ap; - folio_copy(tmp_folio, folio); - - ap->folios[folio_index] = tmp_folio; + folio_get(folio); + ap->folios[folio_index] = folio; ap->descs[folio_index].offset = 0; ap->descs[folio_index].length = PAGE_SIZE; inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK); - node_stat_add_folio(tmp_folio, NR_WRITEBACK_TEMP); + node_stat_add_folio(folio, NR_WRITEBACK); } static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio, @@ -2162,18 +2002,12 @@ static int fuse_writepage_locked(struct folio *folio) struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_writepage_args *wpa; struct fuse_args_pages *ap; - struct folio *tmp_folio; struct fuse_file *ff; - int error = -ENOMEM; - - tmp_folio = folio_alloc(GFP_NOFS | __GFP_HIGHMEM, 0); - if (!tmp_folio) - goto err; + int error = -EIO; - error = -EIO; ff = fuse_write_file_get(fi); if (!ff) - goto err_nofile; + goto err; wpa = fuse_writepage_args_setup(folio, ff); error = -ENOMEM; @@ -2184,22 +2018,17 @@ static int fuse_writepage_locked(struct folio *folio) ap->num_folios = 1; folio_start_writeback(folio); - fuse_writepage_args_page_fill(wpa, folio, tmp_folio, 0); + fuse_writepage_args_page_fill(wpa, folio, 0); spin_lock(&fi->lock); - tree_insert(&fi->writepages, wpa); list_add_tail(&wpa->queue_entry, &fi->queued_writes); fuse_flush_writepages(inode); spin_unlock(&fi->lock); - folio_end_writeback(folio); - return 0; err_writepage_args: fuse_file_put(ff, false); -err_nofile: - folio_put(tmp_folio); err: mapping_set_error(folio->mapping, error); return error; @@ -2209,7 +2038,6 @@ struct fuse_fill_wb_data { struct fuse_writepage_args *wpa; struct fuse_file *ff; struct inode *inode; - struct folio **orig_folios; unsigned int max_folios; }; @@ -2244,69 +2072,11 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data) struct fuse_writepage_args *wpa = data->wpa; struct inode *inode = data->inode; struct fuse_inode *fi = get_fuse_inode(inode); - int num_folios = wpa->ia.ap.num_folios; - int i; spin_lock(&fi->lock); list_add_tail(&wpa->queue_entry, &fi->queued_writes); fuse_flush_writepages(inode); spin_unlock(&fi->lock); - - for (i = 0; i < num_folios; i++) - folio_end_writeback(data->orig_folios[i]); -} - -/* - * Check under fi->lock if the page is under writeback, and insert it onto the - * rb_tree if not. Otherwise iterate auxiliary write requests, to see if there's - * one already added for a page at this offset. If there's none, then insert - * this new request onto the auxiliary list, otherwise reuse the existing one by - * swapping the new temp page with the old one. - */ -static bool fuse_writepage_add(struct fuse_writepage_args *new_wpa, - struct folio *folio) -{ - struct fuse_inode *fi = get_fuse_inode(new_wpa->inode); - struct fuse_writepage_args *tmp; - struct fuse_writepage_args *old_wpa; - struct fuse_args_pages *new_ap = &new_wpa->ia.ap; - - WARN_ON(new_ap->num_folios != 0); - new_ap->num_folios = 1; - - spin_lock(&fi->lock); - old_wpa = fuse_insert_writeback(&fi->writepages, new_wpa); - if (!old_wpa) { - spin_unlock(&fi->lock); - return true; - } - - for (tmp = old_wpa->next; tmp; tmp = tmp->next) { - pgoff_t curr_index; - - WARN_ON(tmp->inode != new_wpa->inode); - curr_index = tmp->ia.write.in.offset >> PAGE_SHIFT; - if (curr_index == folio->index) { - WARN_ON(tmp->ia.ap.num_folios != 1); - swap(tmp->ia.ap.folios[0], new_ap->folios[0]); - break; - } - } - - if (!tmp) { - new_wpa->next = old_wpa->next; - old_wpa->next = new_wpa; - } - - spin_unlock(&fi->lock); - - if (tmp) { - fuse_writepage_finish_stat(new_wpa->inode, - folio); - fuse_writepage_free(new_wpa); - } - - return false; } static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio, @@ -2315,15 +2085,6 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio, { WARN_ON(!ap->num_folios); - /* - * Being under writeback is unlikely but possible. For example direct - * read to an mmaped fuse file will set the page dirty twice; once when - * the pages are faulted with get_user_pages(), and then after the read - * completed. - */ - if (fuse_folio_is_writeback(data->inode, folio)) - return true; - /* Reached max pages */ if (ap->num_folios == fc->max_pages) return true; @@ -2333,7 +2094,7 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio, return true; /* Discontinuity */ - if (data->orig_folios[ap->num_folios - 1]->index + 1 != folio_index(folio)) + if (ap->folios[ap->num_folios - 1]->index + 1 != folio_index(folio)) return true; /* Need to grow the pages array? If so, did the expansion fail? */ @@ -2352,7 +2113,6 @@ static int fuse_writepages_fill(struct folio *folio, struct inode *inode = data->inode; struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_conn *fc = get_fuse_conn(inode); - struct folio *tmp_folio; int err; if (!data->ff) { @@ -2367,54 +2127,23 @@ static int fuse_writepages_fill(struct folio *folio, data->wpa = NULL; } - err = -ENOMEM; - tmp_folio = folio_alloc(GFP_NOFS | __GFP_HIGHMEM, 0); - if (!tmp_folio) - goto out_unlock; - - /* - * The page must not be redirtied until the writeout is completed - * (i.e. userspace has sent a reply to the write request). Otherwise - * there could be more than one temporary page instance for each real - * page. - * - * This is ensured by holding the page lock in page_mkwrite() while - * checking fuse_page_is_writeback(). We already hold the page lock - * since clear_page_dirty_for_io() and keep it held until we add the - * request to the fi->writepages list and increment ap->num_folios. - * After this fuse_page_is_writeback() will indicate that the page is - * under writeback, so we can release the page lock. - */ if (data->wpa == NULL) { err = -ENOMEM; wpa = fuse_writepage_args_setup(folio, data->ff); - if (!wpa) { - folio_put(tmp_folio); + if (!wpa) goto out_unlock; - } fuse_file_get(wpa->ia.ff); data->max_folios = 1; ap = &wpa->ia.ap; } folio_start_writeback(folio); - fuse_writepage_args_page_fill(wpa, folio, tmp_folio, ap->num_folios); - data->orig_folios[ap->num_folios] = folio; + fuse_writepage_args_page_fill(wpa, folio, ap->num_folios); err = 0; - if (data->wpa) { - /* - * Protected by fi->lock against concurrent access by - * fuse_page_is_writeback(). - */ - spin_lock(&fi->lock); - ap->num_folios++; - spin_unlock(&fi->lock); - } else if (fuse_writepage_add(wpa, folio)) { + ap->num_folios++; + if (!data->wpa) data->wpa = wpa; - } else { - folio_end_writeback(folio); - } out_unlock: folio_unlock(folio); @@ -2441,13 +2170,6 @@ static int fuse_writepages(struct address_space *mapping, data.wpa = NULL; data.ff = NULL; - err = -ENOMEM; - data.orig_folios = kcalloc(fc->max_pages, - sizeof(struct folio *), - GFP_NOFS); - if (!data.orig_folios) - goto out; - err = write_cache_pages(mapping, wbc, fuse_writepages_fill, &data); if (data.wpa) { WARN_ON(!data.wpa->ia.ap.num_folios); @@ -2456,7 +2178,6 @@ static int fuse_writepages(struct address_space *mapping, if (data.ff) fuse_file_put(data.ff, false); - kfree(data.orig_folios); out: return err; } @@ -2481,7 +2202,7 @@ static int fuse_write_begin(struct file *file, struct address_space *mapping, if (IS_ERR(folio)) goto error; - fuse_wait_on_page_writeback(mapping->host, folio->index); + folio_wait_writeback(folio); if (folio_test_uptodate(folio) || len >= folio_size(folio)) goto success; @@ -2545,13 +2266,11 @@ static int fuse_launder_folio(struct folio *folio) { int err = 0; if (folio_clear_dirty_for_io(folio)) { - struct inode *inode = folio->mapping->host; - /* Serialize with pending writeback for the same page */ - fuse_wait_on_page_writeback(inode, folio->index); + folio_wait_writeback(folio); err = fuse_writepage_locked(folio); if (!err) - fuse_wait_on_page_writeback(inode, folio->index); + folio_wait_writeback(folio); } return err; } @@ -2595,7 +2314,7 @@ static vm_fault_t fuse_page_mkwrite(struct vm_fault *vmf) return VM_FAULT_NOPAGE; } - fuse_wait_on_folio_writeback(inode, folio); + folio_wait_writeback(folio); return VM_FAULT_LOCKED; } @@ -3413,9 +3132,12 @@ static const struct address_space_operations fuse_file_aops = { void fuse_init_file_inode(struct inode *inode, unsigned int flags) { struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_conn *fc = get_fuse_conn(inode); inode->i_fop = &fuse_file_operations; inode->i_data.a_ops = &fuse_file_aops; + if (fc->writeback_cache) + mapping_set_writeback_may_block(&inode->i_data); INIT_LIST_HEAD(&fi->write_files); INIT_LIST_HEAD(&fi->queued_writes); @@ -3423,7 +3145,6 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags) fi->iocachectr = 0; init_waitqueue_head(&fi->page_waitq); init_waitqueue_head(&fi->direct_io_waitq); - fi->writepages = RB_ROOT; if (IS_ENABLED(CONFIG_FUSE_DAX)) fuse_dax_inode_init(inode, flags);