diff mbox series

[10/13] mm/filemap: make buffered writes work with RWF_UNCACHED

Message ID 20241108174505.1214230-11-axboe@kernel.dk (mailing list archive)
State New
Headers show
Series [01/13] mm/filemap: change filemap_create_folio() to take a struct kiocb | expand

Commit Message

Jens Axboe Nov. 8, 2024, 5:43 p.m. UTC
If RWF_UNCACHED is set for a write, mark the folios being written with
drop_writeback. Then writeback completion will drop the pages. The
write_iter handler simply kicks off writeback for the pages, and
writeback completion will take care of the rest.

This provides similar benefits to using RWF_UNCACHED with reads. Testing
buffered writes on 32 files:

writing bs 65536, uncached 0
  1s: 196035MB/sec, MB=196035
  2s: 132308MB/sec, MB=328147
  3s: 132438MB/sec, MB=460586
  4s: 116528MB/sec, MB=577115
  5s: 103898MB/sec, MB=681014
  6s: 108893MB/sec, MB=789907
  7s: 99678MB/sec, MB=889586
  8s: 106545MB/sec, MB=996132
  9s: 106826MB/sec, MB=1102958
 10s: 101544MB/sec, MB=1204503
 11s: 111044MB/sec, MB=1315548
 12s: 124257MB/sec, MB=1441121
 13s: 116031MB/sec, MB=1557153
 14s: 114540MB/sec, MB=1671694
 15s: 115011MB/sec, MB=1786705
 16s: 115260MB/sec, MB=1901966
 17s: 116068MB/sec, MB=2018034
 18s: 116096MB/sec, MB=2134131

where it's quite obvious where the page cache filled, and performance
dropped from to about half of where it started, settling in at around
115GB/sec. Meanwhile, 32 kswapds were running full steam trying to
reclaim pages.

Running the same test with uncached buffered writes:

writing bs 65536, uncached 1
  1s: 198974MB/sec
  2s: 189618MB/sec
  3s: 193601MB/sec
  4s: 188582MB/sec
  5s: 193487MB/sec
  6s: 188341MB/sec
  7s: 194325MB/sec
  8s: 188114MB/sec
  9s: 192740MB/sec
 10s: 189206MB/sec
 11s: 193442MB/sec
 12s: 189659MB/sec
 13s: 191732MB/sec
 14s: 190701MB/sec
 15s: 191789MB/sec
 16s: 191259MB/sec
 17s: 190613MB/sec
 18s: 191951MB/sec

and the behavior is fully predictable, performing the same throughout
even after the page cache would otherwise have fully filled with dirty
data. It's also about 65% faster, and using half the CPU of the system
compared to the normal buffered write.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 mm/filemap.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)
diff mbox series

Patch

diff --git a/mm/filemap.c b/mm/filemap.c
index 1e455ca872b5..d4c5928c5e2a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1610,6 +1610,8 @@  EXPORT_SYMBOL(folio_wait_private_2_killable);
  */
 void folio_end_writeback(struct folio *folio)
 {
+	bool folio_uncached;
+
 	VM_BUG_ON_FOLIO(!folio_test_writeback(folio), folio);
 
 	/*
@@ -1631,6 +1633,7 @@  void folio_end_writeback(struct folio *folio)
 	 * reused before the folio_wake_bit().
 	 */
 	folio_get(folio);
+	folio_uncached = folio_test_clear_uncached(folio);
 	if (__folio_end_writeback(folio))
 		folio_wake_bit(folio, PG_writeback);
 	acct_reclaim_writeback(folio);
@@ -1639,12 +1642,10 @@  void folio_end_writeback(struct folio *folio)
 	 * If folio is marked as uncached, then pages should be dropped when
 	 * writeback completes. Do that now.
 	 */
-	if (folio_test_uncached(folio)) {
-		folio_lock(folio);
-		if (invalidate_complete_folio2(folio->mapping, folio, 0))
-			folio_clear_uncached(folio);
+	if (folio_uncached && folio_trylock(folio)) {
+		if (folio->mapping)
+			invalidate_complete_folio2(folio->mapping, folio, 0);
 		folio_unlock(folio);
-
 	}
 	folio_put(folio);
 }
@@ -4082,6 +4083,9 @@  ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 		if (unlikely(status < 0))
 			break;
 
+		if (iocb->ki_flags & IOCB_UNCACHED)
+			folio_set_uncached(folio);
+
 		offset = offset_in_folio(folio, pos);
 		if (bytes > folio_size(folio) - offset)
 			bytes = folio_size(folio) - offset;
@@ -4122,6 +4126,12 @@  ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 
 	if (!written)
 		return status;
+	if (iocb->ki_flags & IOCB_UNCACHED) {
+		/* kick off uncached writeback, completion will drop it */
+		__filemap_fdatawrite_range(mapping, iocb->ki_pos,
+						iocb->ki_pos + written,
+						WB_SYNC_NONE);
+	}
 	iocb->ki_pos += written;
 	return written;
 }