From patchwork Fri Apr 4 18:14:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 14038727 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF0F2C36010 for ; Fri, 4 Apr 2025 18:14:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 049806B0005; Fri, 4 Apr 2025 14:14:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F12B96B000E; Fri, 4 Apr 2025 14:14:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDA4B6B0010; Fri, 4 Apr 2025 14:14:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BD3426B000D for ; Fri, 4 Apr 2025 14:14:53 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id AA44DC068E for ; Fri, 4 Apr 2025 18:14:54 +0000 (UTC) X-FDA: 83297162508.23.624CC31 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf04.hostedemail.com (Postfix) with ESMTP id DC7E54000B for ; Fri, 4 Apr 2025 18:14:52 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=X2dVoHR1; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743790492; a=rsa-sha256; cv=none; b=hXBsvq1rxi1VNNl0cGoh7u9U9lgYdbs1tPO2Gvf65pTbytu29jfGJF8p52FxOKTYTz4mfE ZT6fSXejmQDKKC+X5uq/Eu8pwcBqIZAIx9jcp6lpUKraiGh6GokwKqx2E/dsddfyuQadnG zEQfAVEP5OCLt2a9Aa7tIJeGD/iPavM= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=X2dVoHR1; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743790492; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=aSkmJg5ac6Byx1YjNxRyiIUlORI2T8jiR1BS3qTlfTs=; b=Gm0q8sU/iuoZXYlgc4dtMCJr6qhU0cRN9JDaxNAk4SBEGrVVy/AtfGBtjv/yVUnItKEA1J XPOUble/E8WHaTsVsfSetREhgWz7FNAyCxCeSBPL92ZhU9IhKsDw0zlS5ra8i5IFGJtOg0 RV6RoFsfXVPVPdtcTBFKPL44F/x67Gs= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-224341bbc1dso22524505ad.3 for ; Fri, 04 Apr 2025 11:14:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743790492; x=1744395292; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=aSkmJg5ac6Byx1YjNxRyiIUlORI2T8jiR1BS3qTlfTs=; b=X2dVoHR1O4WgnLMk44CiHchlRXTU33W6UVUVI4J4b508rUhHi0sIgwtOMzuOddbHIS vnInwNWdWY6qbF+iGNXDBuDvsAyHV4CcAfYb04z2EwBdxOJ83NP3H9w5HwqUhr+ZEULi 7skvVmHWKOiUnQPcOEnx8PKzNYjsyAHfY4/W8baSnMJRpo0jayy4x02FF3Q/F2UJmYZk KT0N4L5S7vZr9pmMoAt4B+Kp8n+Vl410HFAHepafwcgNpBe2Qmmkrg4jKlwnvGQ7s8zS gRYhFHkMHZupJt3f7XqjM3leawNgLam2UDOweDooqOLa5zksRb7v6LGWeIlvUs9eWtS5 jALA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743790492; x=1744395292; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=aSkmJg5ac6Byx1YjNxRyiIUlORI2T8jiR1BS3qTlfTs=; b=rpjDRkNGYJw97T5JwBN8WwPP8HOlC7Nl9v3VEqr7/w5Xk3j63ApELyR/I3iBhj2wFy 9ffeQSvJ0ayfnbA185v4XyT1hXo/xOWNG8hgbs+8oIV+6pmhetE7FmgL+CldIwprUv3Y lp7wSfAfu3pEcX8iHAwlucYcFoDm6GH2nPDoXI/j6y/2wv3D6AclmIp4zsjhjY2dgiK8 70F3j9mn4zH/bHa1jH4kvoUyP8ewyIKVYb5T7Xr+8RS2X0t8RBp/umKJm0MjSTZoj1ji Vput5E6cKa3SLblSqI8oc0lW+ESwKQWue/XgALphm2UFnPGaTzDcOYeMn+R35mvXkJhO YGyA== X-Forwarded-Encrypted: i=1; AJvYcCUo2niinM4xv4MpU6p2IWD82KK+7YN3EpFHr0bNaGri1UVVVE3xMtXVLDzaRZeOwiZuAfZ7qw0h4g==@kvack.org X-Gm-Message-State: AOJu0YwNIXxvtVTuWXV7fL/jNt30B/kdwcwW+jdXeGPBNTFZjjXf9lE4 4Rr06cUg11+5hdnNms1lgrv3FAVAE3JWHZXaXniCNtO4Lw2rT0jG X-Gm-Gg: ASbGncu4VuXCPZhcGlYcewQUlwAtfqByLw5AWdehMckMcx8FG67tu49PcCOm3qiQzAo XHC5yPJzK73fDOPw1VIrgUL56EK4IeTMXF7xodQtLb1j6SLWl+PHssSfn6vfbvzhcQBfYbln1+Y JSYPiD8knOKSKigATK/XPwvAljDewrDboLgG8pMhXkYhXoebA2EUwvj1iJ2GoEpij48BEQewCOD d9Nl4BOAPhp2a7YwjYj/uBYDlJci9pUCJsxmHQ0WWL+Gg20pyzfru0dm66eb1DHyTLAfL2XmNew vbhO8n4AH3VDJ80PI7HhlZ2onfpK94ftiyar X-Google-Smtp-Source: AGHT+IHSQAEQV/dX5KqHjaKvV26q7I4AlYPFvdyP8SsLzg/2MiZDHrqrK2H6tGhfXxDyKO9uEtwhEA== X-Received: by 2002:a17:902:cf0b:b0:220:ea90:191e with SMTP id d9443c01a7336-22a955141b4mr7487475ad.4.1743790491637; Fri, 04 Apr 2025 11:14:51 -0700 (PDT) Received: from localhost ([2a03:2880:ff::]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-229785ad982sm35448425ad.25.2025.04.04.11.14.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Apr 2025 11:14:51 -0700 (PDT) From: Joanne Koong To: miklos@szeredi.hu, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: jefflexu@linux.alibaba.com, shakeel.butt@linux.dev, david@redhat.com, bernd.schubert@fastmail.fm, ziy@nvidia.com, jlayton@kernel.org, kernel-team@meta.com Subject: [PATCH v7 0/3] fuse: remove temp page copies in writeback Date: Fri, 4 Apr 2025 11:14:40 -0700 Message-ID: <20250404181443.1363005-1-joannelkoong@gmail.com> X-Mailer: git-send-email 2.47.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: DC7E54000B X-Stat-Signature: gufpi64aruju4mpkayka1n49h783bchs X-HE-Tag: 1743790492-62176 X-HE-Meta: U2FsdGVkX1/rIu+FOxBFOk0oaiylq/yzSa6xoC4SyWQQTSf3r6jdh7mlzVDrMPtpPHnksh68UvYEoHSGj/iKxvgfJDyqOOF07s753PPN3RoKtljQPiWbMwGlY3a2XglxL25B7yJMblJfXc5vBMq87rZpd+kZ86bycmW6/bPCbK9kcVoaxkLUdlI7Nao9lcxk3lPhbc6hKiFp28sXNPWjVBH9T1M5cop+wXUG0JhO7x4pWPqbJ2HU50mDGC2oghd/B1bO+fWg2Uh8mGGzCB6GBXrkgU/2wCr48OMjy5TmW/ERhGXbymrY41BNJffxHsynkSWfWLmELWFH7A87fmWr19JkOSNTOh+lNfXhMX7m3jV9agNEfGER07s5Y8gTwuJH0aQFTD9KY2Yopxtkz5Kb6AYx/Pre+1yhD7ETCMxyMnhH3toUlMGY5dCe9V25YUqk++R6PJM9uKTuKmsLujNDQ+bl0ye7ErjY+p5gCmFzVZtRKvxoUxC4ce9gOrPai9LFxNDEVq6/MjGJB8gsbubBjgE/9MzZKkvgQ99XOSMG+VA5u5hra0e2jCYj5Wg9WztHXXRwxIWqS+cJ2NNqFO3nvKhAsRaGeT16FHK/cyVBQ4FD3cweucSiSOndvxfnnRdOF8gNyiTLLPC+5suiJ2pXWfTCsoq8xRqwQBdJGB0LvwLWB7qmffzkmyC78dRdiGsQWPlei+W+llkDb8CmhylYgOp5TPLccV07Sd6veCmGqBdD5VJM66f3U/8ZP3sLsP2SbeohaLqQfGqnmPP1v/sPuegFpGGpLIjyLgB8Vd3rAtgs4GHK/b7imtN1YC5hGWVNpe0v0xXKvvFmAz8gukriVnuZWa5C4JuhmZy2BBDqcokwvlpXA5ef0glkDYKDHsMFI3DtEMdXS9U9dwe9iKAEJSAuMY/UbU7jI0NowIujj3Eu9sa/WY9NYTA/EGCzcFz1vEr8mqc+2GiHX2sjwsW eQ7hc6rz UvYoup22KLbixGKnY4HDm9XcnniYnLXs8s+oSNg7LlIDvBSxPFRV1gSGBWUkDtQqOCSVQL5g4I15Bh8sfBICAi7f+I3bED92r/9RUcMs5hkGNukvNsAIaz+lvU5hbH8xWzZk1tU39KtxKUaUGnki17bO4ih59i85had+kosov42i4EyAzdmC1vBA22jM6oIlx7oBFKXQstWirp80ioCkEOvExsktx458b7+JK8q6M3TBtAwEOuvqC+coxyNnrVZb0zGIBlGb1lluJ7mzKyQdrVZzFj6W1vxexoYtA4yupV8yxrbyFamfCL2vzVmFIspyoUpa6zbZswuF8VkWGuT4TAS4cWQYQK2T0XiweU2qIfAkNprBOPpSFUGppZm0XIQpYdSibpR/x8Z0gDOpBVsObffNxrb+SFxLMZ/1pMdwXDo6pBNEOyG0qjFlFSg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The purpose of this patchset is to help make writeback in FUSE filesystems as fast as possible. In the current FUSE writeback design (see commit 3be5a52b30aa ("fuse: support writable mmap"))), a temp page is allocated for every dirty page to be written back, the contents of the dirty page are copied over to the temp page, and the temp page gets handed to the server to write back. This is done so that writeback may be immediately cleared on the dirty page, and this in turn is done in order to mitigate the following deadlock scenario that may arise if reclaim waits on writeback on the dirty page to complete (more details can be found in this thread [1]): * single-threaded FUSE server is in the middle of handling a request that needs a memory allocation * memory allocation triggers direct reclaim * direct reclaim waits on a folio under writeback * the FUSE server can't write back the folio since it's stuck in direct reclaim Allocating and copying dirty pages to temp pages is the biggest performance bottleneck for FUSE writeback. This patchset aims to get rid of the temp page altogether (which will also allow us to get rid of the internal FUSE rb tree that is needed to keep track of writeback status on the temp pages). Benchmarks show approximately a 20% improvement in throughput for 4k block-size writes and a 45% improvement for 1M block-size writes. In the current reclaim code, there is one scenario where writeback is waited on, which is the case where the system is running legacy cgroupv1 and reclaim encounters a folio that already has the reclaim flag set and the caller did not have __GFP_FS (or __GFP_IO if swap) set. This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, which filesystems may set on its inode mappings to indicate that writeback operations may take an indeterminate amount of time to complete. FUSE will set this flag on its mappings. Reclaim for the legacy cgroup v1 case described above will skip reclaim of folios with that flag set. With this change, writeback state is now only cleared on the dirty page after the server has written it back to disk. If the server is deliberately malicious or well-intentioned but buggy, this may stall sync(2) and page migration, but for sync(2), a malicious server may already stall this by not replying to the FUSE_SYNCFS request and for page migration, there are already many easier ways to stall this by having FUSE permanently hold the folio lock. A fuller discussion on this can be found in [2]. Long-term, there needs to be a more comprehensive solution for addressing migration of FUSE pages that handles all scenarios where FUSE may permanently hold the lock, but that is outside the scope of this patchset and will be done as future work. Please also note that this change also now ensures that when sync(2) returns, FUSE filesystems will have persisted writeback changes. [1] https://lore.kernel.org/linux-kernel/495d2400-1d96-4924-99d3-8b2952e05fc3@linux.alibaba.com/ [2] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@gmail.com/ Changelog --------- v6: https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@gmail.com/ Changes from v6 -> v7: * Drop migration and sync patches, as they are useless if a server is determined to be malicious v5: https://lore.kernel.org/linux-fsdevel/20241115224459.427610-1-joannelkoong@gmail.com/ Changes from v5 -> v6: * Add Shakeel and Jingbo's reviewed-bys * Move folio_end_writeback() to fuse_writepage_finish() (Jingbo) * Embed fuse_writepage_finish_stat() logic inline (Jingbo) * Remove node_stat NR_WRITEBACK inc/sub (Jingbo) v4: https://lore.kernel.org/linux-fsdevel/20241107235614.3637221-1-joannelkoong@gmail.com/ Changes from v4 -> v5: * AS_WRITEBACK_MAY_BLOCK -> AS_WRITEBACK_INDETERMINATE (Shakeel) * Drop memory hotplug patch (David and Shakeel) * Remove some more kunnecessary writeback waits in fuse code (Jingbo) * Make commit message for reclaim patch more concise - drop part about deadlock and just focus on how it may stall waits v3: https://lore.kernel.org/linux-fsdevel/20241107191618.2011146-1-joannelkoong@gmail.com/ Changes from v3 -> v4: * Use filemap_fdatawait_range() instead of filemap_range_has_writeback() in readahead v2: https://lore.kernel.org/linux-fsdevel/20241014182228.1941246-1-joannelkoong@gmail.com/ Changes from v2 -> v3: * Account for sync and page migration cases as well (Miklos) * Change AS_NO_WRITEBACK_RECLAIM to the more generic AS_WRITEBACK_MAY_BLOCK * For fuse inodes, set mapping_writeback_may_block only if fc->writeback_cache is enabled v1: https://lore.kernel.org/linux-fsdevel/20241011223434.1307300-1-joannelkoong@gmail.com/T/#t Changes from v1 -> v2: * Have flag in "enum mapping_flags" instead of creating asop_flags (Shakeel) * Set fuse inodes to use AS_NO_WRITEBACK_RECLAIM (Shakeel) Joanne Koong (3): mm: add AS_WRITEBACK_INDETERMINATE mapping flag mm: skip reclaiming folios in legacy memcg writeback indeterminate contexts fuse: remove tmp folio for writebacks and internal rb tree fs/fuse/file.c | 360 ++++------------------------------------ fs/fuse/fuse_i.h | 3 - include/linux/pagemap.h | 11 ++ mm/vmscan.c | 10 +- 4 files changed, 46 insertions(+), 338 deletions(-) Reviewed-by: Jeff Layton