From patchwork Mon Nov 11 23:37:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13871493 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BF6ED3ABF4 for ; Mon, 11 Nov 2024 23:49:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 12B976B00E0; Mon, 11 Nov 2024 18:49:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C9F36B00E2; Mon, 11 Nov 2024 18:49:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E10946B00E3; Mon, 11 Nov 2024 18:49:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C03486B00E0 for ; Mon, 11 Nov 2024 18:49:06 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7ED65C191A for ; Mon, 11 Nov 2024 23:49:06 +0000 (UTC) X-FDA: 82775456526.15.CE0A1CC Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf17.hostedemail.com (Postfix) with ESMTP id 1086040023 for ; Mon, 11 Nov 2024 23:48:32 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=YL4pkVzs; spf=pass (imf17.hostedemail.com: domain of axboe@kernel.dk designates 209.85.210.176 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731368752; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=34CEBvu7gjOMz8fnHdZ3R+2IdL/8ao7S7M5vWTfYjmM=; b=LfHqzHvgl7OLv5QIzkKo2Gj5jVc+CrILnJvQQnxowQbxTkDRleKWKVqEZwsl7c2Zdtsb+K inYOgHE1VQqRjDV9H5K3LTegfBh4Pux0iuf9HjwDIg5491zTBZNagfzxb2djSj41+5ZOnN Yx+r21oX6W0So79yK2gaBI4AGofynAQ= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=YL4pkVzs; spf=pass (imf17.hostedemail.com: domain of axboe@kernel.dk designates 209.85.210.176 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731368752; a=rsa-sha256; cv=none; b=rHBrNTzASc53l9ew0Zb1dL8ne9OCwKTdLtrLvcDRp/R2YZ7KStkleVwvxDB9z62pxQ3eaN Uu1V0t+Q0PZZRtyrp3BiCQtWUGd/jk4C62+/EpFy6MtR3ZetjqlRNj7xCJcHTiwaPF+EHZ mLtFyw2q5O5gTz8cB7sjiwfkbu8jP8c= Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-7240fa50694so3694794b3a.1 for ; Mon, 11 Nov 2024 15:49:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1731368943; x=1731973743; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=34CEBvu7gjOMz8fnHdZ3R+2IdL/8ao7S7M5vWTfYjmM=; b=YL4pkVzsvJYHcWLNznVUZCQkD7oBWdECXl7pDfQaq77rmTjGMQaBfBP8pH0R8xeJDi x0yMWaznhRS+hm4bagclFvWl3mOxyWAxk7vS6sAfwIZRKisWZtwQm5Th++hontKeyWNV IUrnNRDjuok47EDHGIDQEqefsrSMsqsx2ozBPk2unCfAjnX1V+vBgg/9qD+J/6GIKF0B 8JrH09KCRf9y2xEv3ODd06WZyGc/Pz+sq5t1KPvZ0tgvKRyyzjJRP8nMrL0jAFDb0DUU l4x0Iw5H3KW2Sm129MW9saVVH1okL77kh8DvD+1sJhThnx/fj6ZOpohLfMTjB1H4SnIr v9qQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731368943; x=1731973743; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=34CEBvu7gjOMz8fnHdZ3R+2IdL/8ao7S7M5vWTfYjmM=; b=mUHmd9ztbOJBxYe/kJVnBXdAMEcBDzc7mq1Zke2Ttbrgr7x6NMqW4Ol1GChIewnitn +a1cr3bSH4Tlcj04IOW+JxgzD+UFulUmppLPVv6AmCWisPbX2eoYZTCcLZcxthpj06pK FjurYKkNFq2wJLkjlSPw/ZM7NoUQ6CYK7NAIOb8B7tsZ36JbCZ7yWCIOZbLBEXBeFcfo TJUU2BF3TJSJLR7U4GR78O1gT3L0DNAlCbdZnMH4RAZYpvRhSb70BJ+Cu1veQbjKwPA8 aH1DMpbCMLs9Td8udXwM/oAFv/lLC2Eo56puHNGq/dD0S07HWMg273mtoMw4/JNjVO5H DaoA== X-Gm-Message-State: AOJu0YwU5dhxoq9iEdxyC5WUdbwtWC4hnJgpD5Vdjq7jyHXyas8p6bgO OCWrZChO2Xi82dMLhAdGp/1mELNB6bE1IM+umaV0NXXOMg18Il+Zn/PDBjTU2y5Gbi+u5kkuRG9 H4VQ= X-Google-Smtp-Source: AGHT+IH/Zn0FG6bgzME+vU3LW7llTjZ8LDSnQYNIFX21AVqsAPSSvTFEg9AexjHSBcXEhmlzou7m5w== X-Received: by 2002:a05:6a20:7292:b0:1db:e82f:2a63 with SMTP id adf61e73a8af0-1dc228c6973mr20790006637.3.1731368942803; Mon, 11 Nov 2024 15:49:02 -0800 (PST) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-724078a7ee9sm10046057b3a.64.2024.11.11.15.49.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Nov 2024 15:49:02 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Cc: hannes@cmpxchg.org, clm@meta.com, linux-kernel@vger.kernel.org, willy@infradead.org, kirill@shutemov.name, linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, Jens Axboe Subject: [PATCH 10/16] mm/filemap: make buffered writes work with RWF_UNCACHED Date: Mon, 11 Nov 2024 16:37:37 -0700 Message-ID: <20241111234842.2024180-11-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241111234842.2024180-1-axboe@kernel.dk> References: <20241111234842.2024180-1-axboe@kernel.dk> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 1086040023 X-Stat-Signature: ikiwxzruoenyt8m1sh45m8ixgf43yciz X-Rspam-User: X-HE-Tag: 1731368912-130061 X-HE-Meta: U2FsdGVkX19QPZv+UIVkcsavj+E/8KnVhPmb9ct2XPs4/o2kuGvlidrRPDyc8g9B9SfuPH2vBRqzy/IOMAAnhkeB3hdvOXqXzo3sQYB+uBxaFXB5capWk7WzYNBhx6CV941YgPqyeTSTeOWVjUxgomkM4oRoQrzJUcC+gkdLyNEW9/7922jclDamsJUvf5wqjw4KhO2qODoAJ819VZEPIGCngajt4JO36IbbHvBEBlJkeuWXuGUVlMlmAaWqndwgsdJbPx0MvHOymAJfIoeEtdvXogyW6Re0lVIh0MDoP4dFqSDApEeMFCQxyVu0Uc00XOGHqX/LCnRVgmLpPKFeBa8uj31Rqu1Yc6WdzocpUwyG3p74/IbBkBLxmGwGY+GuEpc94998WzYyvXQkBkIJfQiPcA5d7XfN4KTVIUDSvlcrKi4vcli/kEiSxhlt9lizuL3h7pCeA2j2+FwFq2ve3ZusEOEAFZMPgAj+dYRUcr8nxZYqyQbjHDxh8WAaDqz44u+wJP0W0mrdEcFnobiW+0+zyHC6nbDTNgbswcEs8EOS2OPoBJTGUPZIW2dOqdkuAp5Y+WDxmllCec9gZkbC1GDOmqvYPKcn+zOtWI7/PJZnfsJayfwrRROTKjRgBo0GdvS3JvgHS40x0LuKQDJyFEkHncdCc6FWXEvkzz0tPnBWg3MMjVh8J0N3FYKpEwid7qR1dXXCsrLjZ69dUvmoTfdfdoXhfljLVW9+Tj88PbuD0TjucCXBCSDTYUSfeh+FSSsoPttJqG4gV11hYVJyknpt8teFj1oroZi9KBzfqXUdTjdXr6DdNtYKfSJFut+mMrGVIFNlk550w2IZNVSeEtDMLQEMLkuIkOsVURhNyenlyl90Pu3jgQn4gpd4IxOsLg/LOQLoZGOnSkKuiFhhhmROKNg5rPE3bJjfjo5ZnTXdOrQ5jEU+ImFtcuN3MQWPfO9T6w/N0/iYJRQ5Tsw iusM/mcP MteGexN1tWpg5emK49Tltjj99Eq7fexqdoFmIeoOLtWyh8hWieuf6/7dFjkcErLEj1jEiYPgfsBi5vqF9G1t8d2FpaROCkonApnuX9RAf15hgE+lSN0P8G/7PVPCfwjOX3pK5Lg1KPCCK/cilfobQeLvhaPlBbZs6TXvAWJ1O9Kegf1TmwiAXzug8i5UsJLyK74lRb75XO+1/CFAEtvAX3pKHugMjnkbOOFbEkDywUDc1eKaf6RfeMobN/xv+OYIWvYvZ4p9pobY6cas6XWzL30yPGvB6/rr+D1XAXoM4oS08tolVKm0AkK7zlILmjiRxk6K3m2caa47dKXN/p3GVZdGm6jZUY4htxYTM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: If RWF_UNCACHED is set for a write, mark new folios being written with uncached. This is done by passing in the fact that it's an uncached write through the folio pointer. We can only get there when IOCB_UNCACHED was allowed, which can only happen if the file system opts in. Opting in means they need to check for the LSB in the folio pointer to know if it's an uncached write or not. If it is, then FGP_UNCACHED should be used if creating new folios is necessary. Uncached writes will drop any folios they create upon writeback completion, but leave folios that may exist in that range alone. Since ->write_begin() doesn't currently take any flags, and to avoid needing to change the callback kernel wide, use the foliop being passed in to ->write_begin() to signal if this is an uncached write or not. File systems can then use that to mark newly created folios as uncached. Add a helper, generic_uncached_write(), that generic_file_write_iter() calls upon successful completion of an uncached write. This provides similar benefits to using RWF_UNCACHED with reads. Testing buffered writes on 32 files: writing bs 65536, uncached 0 1s: 196035MB/sec 2s: 132308MB/sec 3s: 132438MB/sec 4s: 116528MB/sec 5s: 103898MB/sec 6s: 108893MB/sec 7s: 99678MB/sec 8s: 106545MB/sec 9s: 106826MB/sec 10s: 101544MB/sec 11s: 111044MB/sec 12s: 124257MB/sec 13s: 116031MB/sec 14s: 114540MB/sec 15s: 115011MB/sec 16s: 115260MB/sec 17s: 116068MB/sec 18s: 116096MB/sec where it's quite obvious where the page cache filled, and performance dropped from to about half of where it started, settling in at around 115GB/sec. Meanwhile, 32 kswapds were running full steam trying to reclaim pages. Running the same test with uncached buffered writes: writing bs 65536, uncached 1 1s: 198974MB/sec 2s: 189618MB/sec 3s: 193601MB/sec 4s: 188582MB/sec 5s: 193487MB/sec 6s: 188341MB/sec 7s: 194325MB/sec 8s: 188114MB/sec 9s: 192740MB/sec 10s: 189206MB/sec 11s: 193442MB/sec 12s: 189659MB/sec 13s: 191732MB/sec 14s: 190701MB/sec 15s: 191789MB/sec 16s: 191259MB/sec 17s: 190613MB/sec 18s: 191951MB/sec and the behavior is fully predictable, performing the same throughout even after the page cache would otherwise have fully filled with dirty data. It's also about 65% faster, and using half the CPU of the system compared to the normal buffered write. Signed-off-by: Jens Axboe --- include/linux/pagemap.h | 29 +++++++++++++++++++++++++++++ mm/filemap.c | 17 +++++++++++++++-- 2 files changed, 44 insertions(+), 2 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index d55bf995bd9e..d35280744aa1 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -14,6 +14,7 @@ #include #include #include /* for in_interrupt() */ +#include #include struct folio_batch; @@ -70,6 +71,34 @@ static inline int filemap_write_and_wait(struct address_space *mapping) return filemap_write_and_wait_range(mapping, 0, LLONG_MAX); } +/* + * generic_uncached_write - start uncached writeback + * @iocb: the iocb that was written + * @written: the amount of bytes written + * + * When writeback has been handled by write_iter, this helper should be called + * if the file system supports uncached writes. If %IOCB_UNCACHED is set, it + * will kick off writeback for the specified range. + */ +static inline void generic_uncached_write(struct kiocb *iocb, ssize_t written) +{ + if (iocb->ki_flags & IOCB_UNCACHED) { + struct address_space *mapping = iocb->ki_filp->f_mapping; + + /* kick off uncached writeback */ + __filemap_fdatawrite_range(mapping, iocb->ki_pos, + iocb->ki_pos + written, WB_SYNC_NONE); + } +} + +/* + * Value passed in to ->write_begin() if IOCB_UNCACHED is set for the write, + * and the ->write_begin() handler on a file system supporting FOP_UNCACHED + * must check for this and pass FGP_UNCACHED for folio creation. + */ +#define foliop_uncached ((struct folio *) 0xfee1c001) +#define foliop_is_uncached(foliop) (*(foliop) == foliop_uncached) + /** * filemap_set_wb_err - set a writeback error on an address_space * @mapping: mapping in which to set writeback error diff --git a/mm/filemap.c b/mm/filemap.c index 40debe742abe..0d312de4e20c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -430,6 +430,7 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, return filemap_fdatawrite_wbc(mapping, &wbc); } +EXPORT_SYMBOL_GPL(__filemap_fdatawrite_range); static inline int __filemap_fdatawrite(struct address_space *mapping, int sync_mode) @@ -4076,7 +4077,7 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) ssize_t written = 0; do { - struct folio *folio; + struct folio *folio = NULL; size_t offset; /* Offset into folio */ size_t bytes; /* Bytes to write to folio */ size_t copied; /* Bytes copied from user */ @@ -4104,6 +4105,16 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) break; } + /* + * If IOCB_UNCACHED is set here, we now the file system + * supports it. And hence it'll know to check folip for being + * set to this magic value. If so, it's an uncached write. + * Whenever ->write_begin() changes prototypes again, this + * can go away and just pass iocb or iocb flags. + */ + if (iocb->ki_flags & IOCB_UNCACHED) + folio = foliop_uncached; + status = a_ops->write_begin(file, mapping, pos, bytes, &folio, &fsdata); if (unlikely(status < 0)) @@ -4234,8 +4245,10 @@ ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) ret = __generic_file_write_iter(iocb, from); inode_unlock(inode); - if (ret > 0) + if (ret > 0) { + generic_uncached_write(iocb, ret); ret = generic_write_sync(iocb, ret); + } return ret; } EXPORT_SYMBOL(generic_file_write_iter);