[00/48] Folios for 5.17

Message ID	20211208042256.1923824-1-willy@infradead.org (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Subject: [PATCH 00/48] Folios for 5.17 Date: Wed, 8 Dec 2021 04:22:08 +0000 Message-Id: <20211208042256.1923824-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Folios for 5.17 \| expand [00/48] Folios for 5.17 [01/48] filemap: Remove PageHWPoison check from next_uptodate_page() [02/48] fs/writeback: Convert inode_switch_wbs_work_fn to folios [03/48] mm/doc: Add documentation for folio_test_uptodate [04/48] mm/writeback: Improve __folio_mark_dirty() comment [05/48] pagevec: Add folio_batch [06/48] iov_iter: Add copy_folio_to_iter() [07/48] iov_iter: Convert iter_xarray to use folios [08/48] mm: Add folio_test_pmd_mappable() [09/48] filemap: Add folio_put_wait_locked() [10/48] filemap: Convert page_cache_delete to take a folio [11/48] filemap: Add filemap_unaccount_folio() [12/48] filemap: Convert tracing of page cache operations to folio [13/48] filemap: Add filemap_remove_folio and __filemap_remove_folio [14/48] filemap: Convert find_get_entry to return a folio [15/48] filemap: Remove thp_contains() [16/48] filemap: Convert filemap_get_read_batch to use folios [17/48] filemap: Convert find_get_pages_contig to folios [18/48] filemap: Convert filemap_read_page to take a folio [19/48] filemap: Convert filemap_create_page to folio [20/48] filemap: Convert filemap_range_uptodate to folios [21/48] readahead: Convert page_cache_async_ra() to take a folio [22/48] readahead: Convert page_cache_ra_unbounded to folios [23/48] filemap: Convert do_async_mmap_readahead to take a folio [24/48] filemap: Convert filemap_fault to folio [25/48] filemap: Add read_cache_folio and read_mapping_folio [26/48] filemap: Convert filemap_get_pages to use folios [27/48] filemap: Convert page_cache_delete_batch to folios [28/48] filemap: Use folios in next_uptodate_page [29/48] filemap: Use a folio in filemap_map_pages [30/48] filemap: Use a folio in filemap_page_mkwrite [31/48] filemap: Add filemap_release_folio() [32/48] truncate: Add truncate_cleanup_folio() [33/48] mm: Add unmap_mapping_folio() [34/48] shmem: Convert part of shmem_undo_range() to use a folio [35/48] truncate,shmem: Add truncate_inode_folio() [36/48] truncate: Skip known-truncated indices [37/48] truncate: Convert invalidate_inode_pages2_range() to use a folio [38/48] truncate: Add invalidate_complete_folio2() [39/48] filemap: Convert filemap_read() to use a folio [40/48] filemap: Convert filemap_get_read_batch() to use a folio_batch [41/48] filemap: Return only folios from find_get_entries() [42/48] mm: Convert find_lock_entries() to use a folio_batch [43/48] mm: Remove pagevec_remove_exceptionals() [44/48] fs: Convert vfs_dedupe_file_range_compare to folios [45/48] truncate: Convert invalidate_inode_pages2_range to folios [46/48] truncate,shmem: Handle truncates that split large folios [47/48] XArray: Add xas_advance() [48/48] mm: Use multi-index entries in the page cache

Matthew Wilcox Dec. 8, 2021, 4:22 a.m. UTC

I was trying to get all the way to adding large folios to the page cache,
but I ran out of time.  I think the changes in this batch of patches
are worth adding, even without large folio support for filesystems other
than tmpfs.

The big change here is the last patch which switches from storing
large folios in the page cache as 2^N identical entries to using the
XArray support for multi-index entries.  As the changelog says, this
fixes a bug that can occur when doing (eg) msync() or sync_file_range().

Some parts of this have been sent before.  The first patch is already
in Andrew's tree, but I included it here so the build bots don't freak
out about not being able to apply this patch series.  The folio_batch
(replacement for pagevec) is new.  I also fixed a few bugs.

This all passes xfstests with no new failures on both xfs and tmpfs.
I intend to put all this into for-next tomorrow.

Matthew Wilcox (Oracle) (48):
  filemap: Remove PageHWPoison check from next_uptodate_page()
  fs/writeback: Convert inode_switch_wbs_work_fn to folios
  mm/doc: Add documentation for folio_test_uptodate
  mm/writeback: Improve __folio_mark_dirty() comment
  pagevec: Add folio_batch
  iov_iter: Add copy_folio_to_iter()
  iov_iter: Convert iter_xarray to use folios
  mm: Add folio_test_pmd_mappable()
  filemap: Add folio_put_wait_locked()
  filemap: Convert page_cache_delete to take a folio
  filemap: Add filemap_unaccount_folio()
  filemap: Convert tracing of page cache operations to folio
  filemap: Add filemap_remove_folio and __filemap_remove_folio
  filemap: Convert find_get_entry to return a folio
  filemap: Remove thp_contains()
  filemap: Convert filemap_get_read_batch to use folios
  filemap: Convert find_get_pages_contig to folios
  filemap: Convert filemap_read_page to take a folio
  filemap: Convert filemap_create_page to folio
  filemap: Convert filemap_range_uptodate to folios
  readahead: Convert page_cache_async_ra() to take a folio
  readahead: Convert page_cache_ra_unbounded to folios
  filemap: Convert do_async_mmap_readahead to take a folio
  filemap: Convert filemap_fault to folio
  filemap: Add read_cache_folio and read_mapping_folio
  filemap: Convert filemap_get_pages to use folios
  filemap: Convert page_cache_delete_batch to folios
  filemap: Use folios in next_uptodate_page
  filemap: Use a folio in filemap_map_pages
  filemap: Use a folio in filemap_page_mkwrite
  filemap: Add filemap_release_folio()
  truncate: Add truncate_cleanup_folio()
  mm: Add unmap_mapping_folio()
  shmem: Convert part of shmem_undo_range() to use a folio
  truncate,shmem: Add truncate_inode_folio()
  truncate: Skip known-truncated indices
  truncate: Convert invalidate_inode_pages2_range() to use a folio
  truncate: Add invalidate_complete_folio2()
  filemap: Convert filemap_read() to use a folio
  filemap: Convert filemap_get_read_batch() to use a folio_batch
  filemap: Return only folios from find_get_entries()
  mm: Convert find_lock_entries() to use a folio_batch
  mm: Remove pagevec_remove_exceptionals()
  fs: Convert vfs_dedupe_file_range_compare to folios
  truncate: Convert invalidate_inode_pages2_range to folios
  truncate,shmem: Handle truncates that split large folios
  XArray: Add xas_advance()
  mm: Use multi-index entries in the page cache

 fs/fs-writeback.c              |  24 +-
 fs/remap_range.c               | 116 ++--
 include/linux/huge_mm.h        |  14 +
 include/linux/mm.h             |  68 +--
 include/linux/page-flags.h     |  13 +-
 include/linux/pagemap.h        |  59 +-
 include/linux/pagevec.h        |  61 ++-
 include/linux/uio.h            |   7 +
 include/linux/xarray.h         |  18 +
 include/trace/events/filemap.h |  32 +-
 lib/iov_iter.c                 |  29 +-
 lib/xarray.c                   |   6 +-
 mm/filemap.c                   | 967 ++++++++++++++++-----------------
 mm/folio-compat.c              |  11 +
 mm/huge_memory.c               |  20 +-
 mm/internal.h                  |  35 +-
 mm/khugepaged.c                |  12 +-
 mm/memory.c                    |  27 +-
 mm/migrate.c                   |  29 +-
 mm/page-writeback.c            |   6 +-
 mm/readahead.c                 |  24 +-
 mm/shmem.c                     | 174 +++---
 mm/swap.c                      |  26 +-
 mm/truncate.c                  | 305 ++++++-----
 24 files changed, 1100 insertions(+), 983 deletions(-)

William Kucharski Dec. 26, 2021, 10:26 p.m. UTC | #1

Consolidated multiple review comments into one email, the majority are nits at
best:

[PATCH 04/48]:

An obnoxiously pendantic English grammar nit:

+ * lock).  This can also be called from mark_buffer_dirty(), which I

The period should be inside the paren, e.g.: "lock.)"


[PATCH 05/48]:

+       unsigned char aux[3];

I'd like to see an explanation of why this is "3."

+static inline void folio_batch_init(struct folio_batch *fbatch)
+{
+       fbatch->nr = 0;
+}
+
+static inline unsigned int folio_batch_count(struct folio_batch *fbatch)
+{
+       return fbatch->nr;
+}
+
+static inline unsigned int fbatch_space(struct folio_batch *fbatch)
+{
+       return PAGEVEC_SIZE - fbatch->nr;
+}
+
+/**
+ * folio_batch_add() - Add a folio to a batch.
+ * @fbatch: The folio batch.
+ * @folio: The folio to add.
+ *
+ * The folio is added to the end of the batch.
+ * The batch must have previously been initialised using folio_batch_init().
+ *
+ * Return: The number of slots still available.
+ */
+static inline unsigned folio_batch_add(struct folio_batch *fbatch,
+               struct folio *folio)
+{
+       fbatch->folios[fbatch->nr++] = folio;

Is there any need to validate fbatch in these inlines?

[PATCH 07/48]:

+       xas_for_each(&xas, folio, ULONG_MAX) {                  \
                unsigned left;                                  \
-               if (xas_retry(&xas, head))                      \
+               size_t offset = offset_in_folio(folio, start + __off);  \
+               if (xas_retry(&xas, folio))                     \
                        continue;                               \
-               if (WARN_ON(xa_is_value(head)))                 \
+               if (WARN_ON(xa_is_value(folio)))                \
                        break;                                  \
-               if (WARN_ON(PageHuge(head)))                    \
+               if (WARN_ON(folio_test_hugetlb(folio)))         \
                        break;                                  \
-               for (j = (head->index < index) ? index - head->index : 0; \
-                    j < thp_nr_pages(head); j++) {             \
-                       void *kaddr = kmap_local_page(head + j);        \
-                       base = kaddr + offset;                  \
-                       len = PAGE_SIZE - offset;               \
+               while (offset < folio_size(folio)) {            \

Since offset is not actually used until after a bunch of error conditions
may exit or restart the loop, and isn't used at all in between, defer
its calculation until just before its first use in the "while."

[PATCH 25/48]:

Whether you use your follow-on patch sent to Christop or defer it until later
as mentioned in your followup email, either approach is fine with me.

Otherwise it all looks good.

For the series:

Reviewed-by: William Kucharski <william.kucharski@oracle.com>


> On Dec 7, 2021, at 9:22 PM, Matthew Wilcox (Oracle) <willy@infradead.org> wrote:
> 
> I was trying to get all the way to adding large folios to the page cache,
> but I ran out of time.  I think the changes in this batch of patches
> are worth adding, even without large folio support for filesystems other
> than tmpfs.
> 
> The big change here is the last patch which switches from storing
> large folios in the page cache as 2^N identical entries to using the
> XArray support for multi-index entries.  As the changelog says, this
> fixes a bug that can occur when doing (eg) msync() or sync_file_range().
> 
> Some parts of this have been sent before.  The first patch is already
> in Andrew's tree, but I included it here so the build bots don't freak
> out about not being able to apply this patch series.  The folio_batch
> (replacement for pagevec) is new.  I also fixed a few bugs.
> 
> This all passes xfstests with no new failures on both xfs and tmpfs.
> I intend to put all this into for-next tomorrow.
> 
> Matthew Wilcox (Oracle) (48):
>  filemap: Remove PageHWPoison check from next_uptodate_page()
>  fs/writeback: Convert inode_switch_wbs_work_fn to folios
>  mm/doc: Add documentation for folio_test_uptodate
>  mm/writeback: Improve __folio_mark_dirty() comment
>  pagevec: Add folio_batch
>  iov_iter: Add copy_folio_to_iter()
>  iov_iter: Convert iter_xarray to use folios
>  mm: Add folio_test_pmd_mappable()
>  filemap: Add folio_put_wait_locked()
>  filemap: Convert page_cache_delete to take a folio
>  filemap: Add filemap_unaccount_folio()
>  filemap: Convert tracing of page cache operations to folio
>  filemap: Add filemap_remove_folio and __filemap_remove_folio
>  filemap: Convert find_get_entry to return a folio
>  filemap: Remove thp_contains()
>  filemap: Convert filemap_get_read_batch to use folios
>  filemap: Convert find_get_pages_contig to folios
>  filemap: Convert filemap_read_page to take a folio
>  filemap: Convert filemap_create_page to folio
>  filemap: Convert filemap_range_uptodate to folios
>  readahead: Convert page_cache_async_ra() to take a folio
>  readahead: Convert page_cache_ra_unbounded to folios
>  filemap: Convert do_async_mmap_readahead to take a folio
>  filemap: Convert filemap_fault to folio
>  filemap: Add read_cache_folio and read_mapping_folio
>  filemap: Convert filemap_get_pages to use folios
>  filemap: Convert page_cache_delete_batch to folios
>  filemap: Use folios in next_uptodate_page
>  filemap: Use a folio in filemap_map_pages
>  filemap: Use a folio in filemap_page_mkwrite
>  filemap: Add filemap_release_folio()
>  truncate: Add truncate_cleanup_folio()
>  mm: Add unmap_mapping_folio()
>  shmem: Convert part of shmem_undo_range() to use a folio
>  truncate,shmem: Add truncate_inode_folio()
>  truncate: Skip known-truncated indices
>  truncate: Convert invalidate_inode_pages2_range() to use a folio
>  truncate: Add invalidate_complete_folio2()
>  filemap: Convert filemap_read() to use a folio
>  filemap: Convert filemap_get_read_batch() to use a folio_batch
>  filemap: Return only folios from find_get_entries()
>  mm: Convert find_lock_entries() to use a folio_batch
>  mm: Remove pagevec_remove_exceptionals()
>  fs: Convert vfs_dedupe_file_range_compare to folios
>  truncate: Convert invalidate_inode_pages2_range to folios
>  truncate,shmem: Handle truncates that split large folios
>  XArray: Add xas_advance()
>  mm: Use multi-index entries in the page cache
> 
> fs/fs-writeback.c              |  24 +-
> fs/remap_range.c               | 116 ++--
> include/linux/huge_mm.h        |  14 +
> include/linux/mm.h             |  68 +--
> include/linux/page-flags.h     |  13 +-
> include/linux/pagemap.h        |  59 +-
> include/linux/pagevec.h        |  61 ++-
> include/linux/uio.h            |   7 +
> include/linux/xarray.h         |  18 +
> include/trace/events/filemap.h |  32 +-
> lib/iov_iter.c                 |  29 +-
> lib/xarray.c                   |   6 +-
> mm/filemap.c                   | 967 ++++++++++++++++-----------------
> mm/folio-compat.c              |  11 +
> mm/huge_memory.c               |  20 +-
> mm/internal.h                  |  35 +-
> mm/khugepaged.c                |  12 +-
> mm/memory.c                    |  27 +-
> mm/migrate.c                   |  29 +-
> mm/page-writeback.c            |   6 +-
> mm/readahead.c                 |  24 +-
> mm/shmem.c                     | 174 +++---
> mm/swap.c                      |  26 +-
> mm/truncate.c                  | 305 ++++++-----
> 24 files changed, 1100 insertions(+), 983 deletions(-)
> 
> -- 
> 2.33.0
> 
>

Matthew Wilcox Jan. 2, 2022, 4:19 p.m. UTC | #2

On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote:
> This all passes xfstests with no new failures on both xfs and tmpfs.
> I intend to put all this into for-next tomorrow.

As a result of Christoph's review, here's the diff.  I don't
think it's worth re-posting the entire patch series.


diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 7d3494f7fb70..dda8d5868c81 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -18,6 +18,7 @@ struct page;
 struct folio;
 struct address_space;
 
+/* Layout must match folio_batch */
 struct pagevec {
 	unsigned char nr;
 	bool percpu_pvec_drained;
@@ -85,17 +86,22 @@ static inline void pagevec_release(struct pagevec *pvec)
  * struct folio_batch - A collection of folios.
  *
  * The folio_batch is used to amortise the cost of retrieving and
- * operating on a set of folios.  The order of folios in the batch is
- * not considered important.  Some users of the folio_batch store
- * "exceptional" entries in it which can be removed by calling
- * folio_batch_remove_exceptionals().
+ * operating on a set of folios.  The order of folios in the batch may be
+ * significant (eg delete_from_page_cache_batch()).  Some users of the
+ * folio_batch store "exceptional" entries in it which can be removed
+ * by calling folio_batch_remove_exceptionals().
  */
 struct folio_batch {
 	unsigned char nr;
-	unsigned char aux[3];
+	bool percpu_pvec_drained;
 	struct folio *folios[PAGEVEC_SIZE];
 };
 
+/* Layout must match pagevec */
+static_assert(sizeof(struct pagevec) == sizeof(struct folio_batch));
+static_assert(offsetof(struct pagevec, pages) ==
+		offsetof(struct folio_batch, folios));
+
 /**
  * folio_batch_init() - Initialise a batch of folios
  * @fbatch: The folio batch.
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 8479cf46b5b1..781d98d96611 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -150,7 +150,7 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
 static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset,
 		size_t bytes, struct iov_iter *i)
 {
-	return copy_page_to_iter((struct page *)folio, offset, bytes, i);
+	return copy_page_to_iter(&folio->page, offset, bytes, i);
 }
 
 static __always_inline __must_check
diff --git a/mm/filemap.c b/mm/filemap.c
index 9b5b2d962c37..33077c264d79 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3451,45 +3451,12 @@ static struct folio *do_read_cache_folio(struct address_space *mapping,
 	if (folio_test_uptodate(folio))
 		goto out;
 
-	/*
-	 * Page is not up to date and may be locked due to one of the following
-	 * case a: Page is being filled and the page lock is held
-	 * case b: Read/write error clearing the page uptodate status
-	 * case c: Truncation in progress (page locked)
-	 * case d: Reclaim in progress
-	 *
-	 * Case a, the page will be up to date when the page is unlocked.
-	 *    There is no need to serialise on the page lock here as the page
-	 *    is pinned so the lock gives no additional protection. Even if the
-	 *    page is truncated, the data is still valid if PageUptodate as
-	 *    it's a race vs truncate race.
-	 * Case b, the page will not be up to date
-	 * Case c, the page may be truncated but in itself, the data may still
-	 *    be valid after IO completes as it's a read vs truncate race. The
-	 *    operation must restart if the page is not uptodate on unlock but
-	 *    otherwise serialising on page lock to stabilise the mapping gives
-	 *    no additional guarantees to the caller as the page lock is
-	 *    released before return.
-	 * Case d, similar to truncation. If reclaim holds the page lock, it
-	 *    will be a race with remove_mapping that determines if the mapping
-	 *    is valid on unlock but otherwise the data is valid and there is
-	 *    no need to serialise with page lock.
-	 *
-	 * As the page lock gives no additional guarantee, we optimistically
-	 * wait on the page to be unlocked and check if it's up to date and
-	 * use the page if it is. Otherwise, the page lock is required to
-	 * distinguish between the different cases. The motivation is that we
-	 * avoid spurious serialisations and wakeups when multiple processes
-	 * wait on the same page for IO to complete.
-	 */
-	folio_wait_locked(folio);
-	if (folio_test_uptodate(folio))
-		goto out;
-
-	/* Distinguish between all the cases under the safety of the lock */
-	folio_lock(folio);
+	if (!folio_trylock(folio)) {
+		folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE);
+		goto repeat;
+	}
 
-	/* Case c or d, restart the operation */
+	/* Folio was truncated from mapping */
 	if (!folio->mapping) {
 		folio_unlock(folio);
 		folio_put(folio);
@@ -3543,7 +3510,9 @@ EXPORT_SYMBOL(read_cache_folio);
 static struct page *do_read_cache_page(struct address_space *mapping,
 		pgoff_t index, filler_t *filler, void *data, gfp_t gfp)
 {
-	struct folio *folio = read_cache_folio(mapping, index, filler, data);
+	struct folio *folio;
+
+	folio = do_read_cache_folio(mapping, index, filler, data, gfp);
 	if (IS_ERR(folio))
 		return &folio->page;
 	return folio_file_page(folio, index);
diff --git a/mm/shmem.c b/mm/shmem.c
index 735404fdfcc3..6cd3af0addc1 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -942,18 +942,18 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 		index++;
 	}
 
-	partial_end = ((lend + 1) % PAGE_SIZE) > 0;
+	partial_end = ((lend + 1) % PAGE_SIZE) != 0;
 	shmem_get_folio(inode, lstart >> PAGE_SHIFT, &folio, SGP_READ);
 	if (folio) {
-		bool same_page;
+		bool same_folio;
 
-		same_page = lend < folio_pos(folio) + folio_size(folio);
-		if (same_page)
+		same_folio = lend < folio_pos(folio) + folio_size(folio);
+		if (same_folio)
 			partial_end = false;
 		folio_mark_dirty(folio);
 		if (!truncate_inode_partial_folio(folio, lstart, lend)) {
 			start = folio->index + folio_nr_pages(folio);
-			if (same_page)
+			if (same_folio)
 				end = folio->index;
 		}
 		folio_unlock(folio);
diff --git a/mm/truncate.c b/mm/truncate.c
index 336c8d099efa..749aac71fda5 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -350,8 +350,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
 	pgoff_t		indices[PAGEVEC_SIZE];
 	pgoff_t		index;
 	int		i;
-	struct folio *	folio;
-	bool partial_end;
+	struct folio	*folio;
+	bool		partial_end;
 
 	if (mapping_empty(mapping))
 		goto out;
@@ -388,7 +388,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 		cond_resched();
 	}
 
-	partial_end = ((lend + 1) % PAGE_SIZE) > 0;
+	partial_end = ((lend + 1) % PAGE_SIZE) != 0;
 	folio = __filemap_get_folio(mapping, lstart >> PAGE_SHIFT, FGP_LOCK, 0);
 	if (folio) {
 		bool same_folio = lend < folio_pos(folio) + folio_size(folio);

William Kucharski Jan. 2, 2022, 11:46 p.m. UTC | #3

Looks good, addresses one of my comments as well.

Again:

Reviewed-by: William Kucharski <william.kucharski@oracle.com>

> On Jan 2, 2022, at 9:19 AM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote:
>> This all passes xfstests with no new failures on both xfs and tmpfs.
>> I intend to put all this into for-next tomorrow.
> 
> As a result of Christoph's review, here's the diff.  I don't
> think it's worth re-posting the entire patch series.
> 
> 
> diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
> index 7d3494f7fb70..dda8d5868c81 100644
> --- a/include/linux/pagevec.h
> +++ b/include/linux/pagevec.h
> @@ -18,6 +18,7 @@ struct page;
> struct folio;
> struct address_space;
> 
> +/* Layout must match folio_batch */
> struct pagevec {
> 	unsigned char nr;
> 	bool percpu_pvec_drained;
> @@ -85,17 +86,22 @@ static inline void pagevec_release(struct pagevec *pvec)
>  * struct folio_batch - A collection of folios.
>  *
>  * The folio_batch is used to amortise the cost of retrieving and
> - * operating on a set of folios.  The order of folios in the batch is
> - * not considered important.  Some users of the folio_batch store
> - * "exceptional" entries in it which can be removed by calling
> - * folio_batch_remove_exceptionals().
> + * operating on a set of folios.  The order of folios in the batch may be
> + * significant (eg delete_from_page_cache_batch()).  Some users of the
> + * folio_batch store "exceptional" entries in it which can be removed
> + * by calling folio_batch_remove_exceptionals().
>  */
> struct folio_batch {
> 	unsigned char nr;
> -	unsigned char aux[3];
> +	bool percpu_pvec_drained;
> 	struct folio *folios[PAGEVEC_SIZE];
> };
> 
> +/* Layout must match pagevec */
> +static_assert(sizeof(struct pagevec) == sizeof(struct folio_batch));
> +static_assert(offsetof(struct pagevec, pages) ==
> +		offsetof(struct folio_batch, folios));
> +
> /**
>  * folio_batch_init() - Initialise a batch of folios
>  * @fbatch: The folio batch.
> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index 8479cf46b5b1..781d98d96611 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -150,7 +150,7 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
> static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset,
> 		size_t bytes, struct iov_iter *i)
> {
> -	return copy_page_to_iter((struct page *)folio, offset, bytes, i);
> +	return copy_page_to_iter(&folio->page, offset, bytes, i);
> }
> 
> static __always_inline __must_check
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 9b5b2d962c37..33077c264d79 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3451,45 +3451,12 @@ static struct folio *do_read_cache_folio(struct address_space *mapping,
> 	if (folio_test_uptodate(folio))
> 		goto out;
> 
> -	/*
> -	 * Page is not up to date and may be locked due to one of the following
> -	 * case a: Page is being filled and the page lock is held
> -	 * case b: Read/write error clearing the page uptodate status
> -	 * case c: Truncation in progress (page locked)
> -	 * case d: Reclaim in progress
> -	 *
> -	 * Case a, the page will be up to date when the page is unlocked.
> -	 *    There is no need to serialise on the page lock here as the page
> -	 *    is pinned so the lock gives no additional protection. Even if the
> -	 *    page is truncated, the data is still valid if PageUptodate as
> -	 *    it's a race vs truncate race.
> -	 * Case b, the page will not be up to date
> -	 * Case c, the page may be truncated but in itself, the data may still
> -	 *    be valid after IO completes as it's a read vs truncate race. The
> -	 *    operation must restart if the page is not uptodate on unlock but
> -	 *    otherwise serialising on page lock to stabilise the mapping gives
> -	 *    no additional guarantees to the caller as the page lock is
> -	 *    released before return.
> -	 * Case d, similar to truncation. If reclaim holds the page lock, it
> -	 *    will be a race with remove_mapping that determines if the mapping
> -	 *    is valid on unlock but otherwise the data is valid and there is
> -	 *    no need to serialise with page lock.
> -	 *
> -	 * As the page lock gives no additional guarantee, we optimistically
> -	 * wait on the page to be unlocked and check if it's up to date and
> -	 * use the page if it is. Otherwise, the page lock is required to
> -	 * distinguish between the different cases. The motivation is that we
> -	 * avoid spurious serialisations and wakeups when multiple processes
> -	 * wait on the same page for IO to complete.
> -	 */
> -	folio_wait_locked(folio);
> -	if (folio_test_uptodate(folio))
> -		goto out;
> -
> -	/* Distinguish between all the cases under the safety of the lock */
> -	folio_lock(folio);
> +	if (!folio_trylock(folio)) {
> +		folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE);
> +		goto repeat;
> +	}
> 
> -	/* Case c or d, restart the operation */
> +	/* Folio was truncated from mapping */
> 	if (!folio->mapping) {
> 		folio_unlock(folio);
> 		folio_put(folio);
> @@ -3543,7 +3510,9 @@ EXPORT_SYMBOL(read_cache_folio);
> static struct page *do_read_cache_page(struct address_space *mapping,
> 		pgoff_t index, filler_t *filler, void *data, gfp_t gfp)
> {
> -	struct folio *folio = read_cache_folio(mapping, index, filler, data);
> +	struct folio *folio;
> +
> +	folio = do_read_cache_folio(mapping, index, filler, data, gfp);
> 	if (IS_ERR(folio))
> 		return &folio->page;
> 	return folio_file_page(folio, index);
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 735404fdfcc3..6cd3af0addc1 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -942,18 +942,18 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
> 		index++;
> 	}
> 
> -	partial_end = ((lend + 1) % PAGE_SIZE) > 0;
> +	partial_end = ((lend + 1) % PAGE_SIZE) != 0;
> 	shmem_get_folio(inode, lstart >> PAGE_SHIFT, &folio, SGP_READ);
> 	if (folio) {
> -		bool same_page;
> +		bool same_folio;
> 
> -		same_page = lend < folio_pos(folio) + folio_size(folio);
> -		if (same_page)
> +		same_folio = lend < folio_pos(folio) + folio_size(folio);
> +		if (same_folio)
> 			partial_end = false;
> 		folio_mark_dirty(folio);
> 		if (!truncate_inode_partial_folio(folio, lstart, lend)) {
> 			start = folio->index + folio_nr_pages(folio);
> -			if (same_page)
> +			if (same_folio)
> 				end = folio->index;
> 		}
> 		folio_unlock(folio);
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 336c8d099efa..749aac71fda5 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -350,8 +350,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
> 	pgoff_t		indices[PAGEVEC_SIZE];
> 	pgoff_t		index;
> 	int		i;
> -	struct folio *	folio;
> -	bool partial_end;
> +	struct folio	*folio;
> +	bool		partial_end;
> 
> 	if (mapping_empty(mapping))
> 		goto out;
> @@ -388,7 +388,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
> 		cond_resched();
> 	}
> 
> -	partial_end = ((lend + 1) % PAGE_SIZE) > 0;
> +	partial_end = ((lend + 1) % PAGE_SIZE) != 0;
> 	folio = __filemap_get_folio(mapping, lstart >> PAGE_SHIFT, FGP_LOCK, 0);
> 	if (folio) {
> 		bool same_folio = lend < folio_pos(folio) + folio_size(folio);
>

Matthew Wilcox Jan. 3, 2022, 1:27 a.m. UTC | #4

Sorry I missed this while travelling.

On Sun, Dec 26, 2021 at 10:26:23PM +0000, William Kucharski wrote:
> Consolidated multiple review comments into one email, the majority are nits at
> best:
> 
> [PATCH 04/48]:
> 
> An obnoxiously pendantic English grammar nit:
> 
> + * lock).  This can also be called from mark_buffer_dirty(), which I
> 
> The period should be inside the paren, e.g.: "lock.)"

That's at least debatable.  The full context here is:

 [...] A few have the folio blocked from truncation through other
+ * means (eg zap_page_range() has it mapped and is holding the page table
+ * lock).

According to AP Style, the period goes outside the paren in this case:
https://blog.apastyle.org/apastyle/2013/03/punctuation-junction-periods-and-parentheses.html

I'm sure you can find an authority to support always placing a period
inside a paren, but we don't have a controlling authority for how to
punctuate our documentation.  I'm great fun at parties when I get going
on the subject of the Oxford comma.

> [PATCH 05/48]:
> 
> +       unsigned char aux[3];
> 
> I'd like to see an explanation of why this is "3."

I got rid of it ... for now ;-)

> +static inline void folio_batch_init(struct folio_batch *fbatch)
> +{
> +       fbatch->nr = 0;
> +}
> +
> +static inline unsigned int folio_batch_count(struct folio_batch *fbatch)
> +{
> +       return fbatch->nr;
> +}
> +
> +static inline unsigned int fbatch_space(struct folio_batch *fbatch)
> +{
> +       return PAGEVEC_SIZE - fbatch->nr;
> +}
> +
> +/**
> + * folio_batch_add() - Add a folio to a batch.
> + * @fbatch: The folio batch.
> + * @folio: The folio to add.
> + *
> + * The folio is added to the end of the batch.
> + * The batch must have previously been initialised using folio_batch_init().
> + *
> + * Return: The number of slots still available.
> + */
> +static inline unsigned folio_batch_add(struct folio_batch *fbatch,
> +               struct folio *folio)
> +{
> +       fbatch->folios[fbatch->nr++] = folio;
> 
> Is there any need to validate fbatch in these inlines?

I don't think so?  At least, there's no validation for the pagevec
equivalents.  I'd be open to adding something cheap if it's likely to
catch a bug someone's likely to introduce.

> [PATCH 07/48]:
> 
> +       xas_for_each(&xas, folio, ULONG_MAX) {                  \
>                 unsigned left;                                  \
> -               if (xas_retry(&xas, head))                      \
> +               size_t offset = offset_in_folio(folio, start + __off);  \
> +               if (xas_retry(&xas, folio))                     \
>                         continue;                               \
> -               if (WARN_ON(xa_is_value(head)))                 \
> +               if (WARN_ON(xa_is_value(folio)))                \
>                         break;                                  \
> -               if (WARN_ON(PageHuge(head)))                    \
> +               if (WARN_ON(folio_test_hugetlb(folio)))         \
>                         break;                                  \
> -               for (j = (head->index < index) ? index - head->index : 0; \
> -                    j < thp_nr_pages(head); j++) {             \
> -                       void *kaddr = kmap_local_page(head + j);        \
> -                       base = kaddr + offset;                  \
> -                       len = PAGE_SIZE - offset;               \
> +               while (offset < folio_size(folio)) {            \
> 
> Since offset is not actually used until after a bunch of error conditions
> may exit or restart the loop, and isn't used at all in between, defer
> its calculation until just before its first use in the "while."

Hmm.  Those conditions aren't likely to occur, but ... now that you
mention it, checking xa_is_value() after using folio as if it's not a
value is Wrong.  So I'm going to fold in this:

@@ -78,13 +78,14 @@
        rcu_read_lock();                                        \
        xas_for_each(&xas, folio, ULONG_MAX) {                  \
                unsigned left;                                  \
-               size_t offset = offset_in_folio(folio, start + __off);  \
+               size_t offset;                                  \
                if (xas_retry(&xas, folio))                     \
                        continue;                               \
                if (WARN_ON(xa_is_value(folio)))                \
                        break;                                  \
                if (WARN_ON(folio_test_hugetlb(folio)))         \
                        break;                                  \
+               offset = offset_in_folio(folio, start + __off); \
                while (offset < folio_size(folio)) {            \
                        base = kmap_local_folio(folio, offset); \
                        len = min(n, len);                      \

> Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Thanks.  I'll go through and add that in, then push again.

Hugh Dickins Jan. 3, 2022, 1:29 a.m. UTC | #5

On Sun, 2 Jan 2022, Matthew Wilcox wrote:
> On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote:
> > This all passes xfstests with no new failures on both xfs and tmpfs.
> > I intend to put all this into for-next tomorrow.
> 
> As a result of Christoph's review, here's the diff.  I don't
> think it's worth re-posting the entire patch series.

Yes, please don't re-post.  I've just spent a few days testing and
fixing the shmem end of it (as I did in Nov 2020 - but I've kept
closer to your intent this time), three patches follow:

 mm/shmem.c    |   58 ++++++++++++++++++++++++++------------------------
 mm/truncate.c |   15 ++++++------
 2 files changed, 38 insertions(+), 35 deletions(-)

Those patches are against next-20211224, not based on your latest diff;
but I think your shmem.c and truncate.c updates can just be replaced by
mine (yes, I too changed "same_page" to "same_folio").

Hugh

Matthew Wilcox Jan. 3, 2022, 1:44 a.m. UTC | #6

On Sun, Jan 02, 2022 at 04:19:41PM +0000, Matthew Wilcox wrote:
> +++ b/include/linux/uio.h
> @@ -150,7 +150,7 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
>  static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset,
>  		size_t bytes, struct iov_iter *i)
>  {
> -	return copy_page_to_iter((struct page *)folio, offset, bytes, i);
> +	return copy_page_to_iter(&folio->page, offset, bytes, i);
>  }

Mmmpf.  I should have tested this more thoroughly.  Some files currently
include uio.h without including mm_types.h.  So this on top fixes
the build:

+++ b/include/linux/uio.h
@@ -7,10 +7,10 @@

 #include <linux/kernel.h>
 #include <linux/thread_info.h>
+#include <linux/mm_types.h>
 #include <uapi/linux/uio.h>

 struct page;
-struct folio;
 struct pipe_inode_info;

 struct kvec {

Christoph Hellwig Jan. 3, 2022, 9:29 a.m. UTC | #7

On Sun, Jan 02, 2022 at 04:19:41PM +0000, Matthew Wilcox wrote:
> On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote:
> > This all passes xfstests with no new failures on both xfs and tmpfs.
> > I intend to put all this into for-next tomorrow.
> 
> As a result of Christoph's review, here's the diff.  I don't
> think it's worth re-posting the entire patch series.

Thanks, the changes look good to me.

William Kucharski Jan. 3, 2022, 7:28 p.m. UTC | #8

Thanks.

I like your changes and will defer to you on the period and Oxford comma. :-)

> On Jan 2, 2022, at 18:27, Matthew Wilcox <willy@infradead.org> wrote:
> 
> Sorry I missed this while travelling.
> 
>> On Sun, Dec 26, 2021 at 10:26:23PM +0000, William Kucharski wrote:
>> Consolidated multiple review comments into one email, the majority are nits at
>> best:
>> 
>> [PATCH 04/48]:
>> 
>> An obnoxiously pendantic English grammar nit:
>> 
>> + * lock).  This can also be called from mark_buffer_dirty(), which I
>> 
>> The period should be inside the paren, e.g.: "lock.)"
> 
> That's at least debatable.  The full context here is:
> 
> [...] A few have the folio blocked from truncation through other
> + * means (eg zap_page_range() has it mapped and is holding the page table
> + * lock).
> 
> According to AP Style, the period goes outside the paren in this case:
> https://blog.apastyle.org/apastyle/2013/03/punctuation-junction-periods-and-parentheses.html
> 
> I'm sure you can find an authority to support always placing a period
> inside a paren, but we don't have a controlling authority for how to
> punctuate our documentation.  I'm great fun at parties when I get going
> on the subject of the Oxford comma.
> 
>> [PATCH 05/48]:
>> 
>> +       unsigned char aux[3];
>> 
>> I'd like to see an explanation of why this is "3."
> 
> I got rid of it ... for now ;-)
> 
>> +static inline void folio_batch_init(struct folio_batch *fbatch)
>> +{
>> +       fbatch->nr = 0;
>> +}
>> +
>> +static inline unsigned int folio_batch_count(struct folio_batch *fbatch)
>> +{
>> +       return fbatch->nr;
>> +}
>> +
>> +static inline unsigned int fbatch_space(struct folio_batch *fbatch)
>> +{
>> +       return PAGEVEC_SIZE - fbatch->nr;
>> +}
>> +
>> +/**
>> + * folio_batch_add() - Add a folio to a batch.
>> + * @fbatch: The folio batch.
>> + * @folio: The folio to add.
>> + *
>> + * The folio is added to the end of the batch.
>> + * The batch must have previously been initialised using folio_batch_init().
>> + *
>> + * Return: The number of slots still available.
>> + */
>> +static inline unsigned folio_batch_add(struct folio_batch *fbatch,
>> +               struct folio *folio)
>> +{
>> +       fbatch->folios[fbatch->nr++] = folio;
>> 
>> Is there any need to validate fbatch in these inlines?
> 
> I don't think so?  At least, there's no validation for the pagevec
> equivalents.  I'd be open to adding something cheap if it's likely to
> catch a bug someone's likely to introduce.
> 
>> [PATCH 07/48]:
>> 
>> +       xas_for_each(&xas, folio, ULONG_MAX) {                  \
>>                unsigned left;                                  \
>> -               if (xas_retry(&xas, head))                      \
>> +               size_t offset = offset_in_folio(folio, start + __off);  \
>> +               if (xas_retry(&xas, folio))                     \
>>                        continue;                               \
>> -               if (WARN_ON(xa_is_value(head)))                 \
>> +               if (WARN_ON(xa_is_value(folio)))                \
>>                        break;                                  \
>> -               if (WARN_ON(PageHuge(head)))                    \
>> +               if (WARN_ON(folio_test_hugetlb(folio)))         \
>>                        break;                                  \
>> -               for (j = (head->index < index) ? index - head->index : 0; \
>> -                    j < thp_nr_pages(head); j++) {             \
>> -                       void *kaddr = kmap_local_page(head + j);        \
>> -                       base = kaddr + offset;                  \
>> -                       len = PAGE_SIZE - offset;               \
>> +               while (offset < folio_size(folio)) {            \
>> 
>> Since offset is not actually used until after a bunch of error conditions
>> may exit or restart the loop, and isn't used at all in between, defer
>> its calculation until just before its first use in the "while."
> 
> Hmm.  Those conditions aren't likely to occur, but ... now that you
> mention it, checking xa_is_value() after using folio as if it's not a
> value is Wrong.  So I'm going to fold in this:
> 
> @@ -78,13 +78,14 @@
>        rcu_read_lock();                                        \
>        xas_for_each(&xas, folio, ULONG_MAX) {                  \
>                unsigned left;                                  \
> -               size_t offset = offset_in_folio(folio, start + __off);  \
> +               size_t offset;                                  \
>                if (xas_retry(&xas, folio))                     \
>                        continue;                               \
>                if (WARN_ON(xa_is_value(folio)))                \
>                        break;                                  \
>                if (WARN_ON(folio_test_hugetlb(folio)))         \
>                        break;                                  \
> +               offset = offset_in_folio(folio, start + __off); \
>                while (offset < folio_size(folio)) {            \
>                        base = kmap_local_folio(folio, offset); \
>                        len = min(n, len);                      \
> 
>> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
> 
> Thanks.  I'll go through and add that in, then push again.
>

Matthew Wilcox Jan. 8, 2022, 5:32 a.m. UTC | #9

On Sun, Jan 02, 2022 at 04:19:41PM +0000, Matthew Wilcox wrote:
> On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote:
> > This all passes xfstests with no new failures on both xfs and tmpfs.
> > I intend to put all this into for-next tomorrow.
> 
> As a result of Christoph's review, here's the diff.  I don't
> think it's worth re-posting the entire patch series.

After further review and integrating Hugh's fixes, here's what
I've just updated the for-next tree with.  A little late, but that's
this time of year ...

diff --git a/mm/internal.h b/mm/internal.h
index e989d8ceec91..26af8a5a5be3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -76,28 +76,7 @@ static inline bool can_madv_lru_vma(struct vm_area_struct *vma)
 	return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP));
 }
 
-/*
- * Parameter block passed down to zap_pte_range in exceptional cases.
- */
-struct zap_details {
-	struct address_space *zap_mapping;	/* Check page->mapping if set */
-	struct folio *single_folio;	/* Locked folio to be unmapped */
-};
-
-/*
- * We set details->zap_mappings when we want to unmap shared but keep private
- * pages. Return true if skip zapping this page, false otherwise.
- */
-static inline bool
-zap_skip_check_mapping(struct zap_details *details, struct page *page)
-{
-	if (!details || !page)
-		return false;
-
-	return details->zap_mapping &&
-	    (details->zap_mapping != page_rmapping(page));
-}
-
+struct zap_details;
 void unmap_page_range(struct mmu_gather *tlb,
 			     struct vm_area_struct *vma,
 			     unsigned long addr, unsigned long end,
diff --git a/mm/memory.c b/mm/memory.c
index a86027026f2a..23f2f1300d42 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1304,6 +1304,28 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
 	return ret;
 }
 
+/*
+ * Parameter block passed down to zap_pte_range in exceptional cases.
+ */
+struct zap_details {
+	struct address_space *zap_mapping;	/* Check page->mapping if set */
+	struct folio *single_folio;	/* Locked folio to be unmapped */
+};
+
+/*
+ * We set details->zap_mapping when we want to unmap shared but keep private
+ * pages. Return true if skip zapping this page, false otherwise.
+ */
+static inline bool
+zap_skip_check_mapping(struct zap_details *details, struct page *page)
+{
+	if (!details || !page)
+		return false;
+
+	return details->zap_mapping &&
+		(details->zap_mapping != page_rmapping(page));
+}
+
 static unsigned long zap_pte_range(struct mmu_gather *tlb,
 				struct vm_area_struct *vma, pmd_t *pmd,
 				unsigned long addr, unsigned long end,
diff --git a/mm/shmem.c b/mm/shmem.c
index 637de21ff40b..28d627444a24 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -151,19 +151,6 @@ int shmem_getpage(struct inode *inode, pgoff_t index,
 		mapping_gfp_mask(inode->i_mapping), NULL, NULL, NULL);
 }
 
-static int shmem_get_folio(struct inode *inode, pgoff_t index,
-		struct folio **foliop, enum sgp_type sgp)
-{
-	struct page *page = NULL;
-	int ret = shmem_getpage(inode, index, &page, sgp);
-
-	if (page)
-		*foliop = page_folio(page);
-	else
-		*foliop = NULL;
-	return ret;
-}
-
 static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
 {
 	return sb->s_fs_info;
@@ -890,6 +877,28 @@ void shmem_unlock_mapping(struct address_space *mapping)
 	}
 }
 
+static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
+{
+	struct folio *folio;
+	struct page *page;
+
+	/*
+	 * At first avoid shmem_getpage(,,,SGP_READ): that fails
+	 * beyond i_size, and reports fallocated pages as holes.
+	 */
+	folio = __filemap_get_folio(inode->i_mapping, index,
+					FGP_ENTRY | FGP_LOCK, 0);
+	if (!xa_is_value(folio))
+		return folio;
+	/*
+	 * But read a page back from swap if any of it is within i_size
+	 * (although in some cases this is just a waste of time).
+	 */
+	page = NULL;
+	shmem_getpage(inode, index, &page, SGP_READ);
+	return page ? page_folio(page) : NULL;
+}
+
 /*
  * Remove range of pages and swap entries from page cache, and free them.
  * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
@@ -904,10 +913,10 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 	struct folio_batch fbatch;
 	pgoff_t indices[PAGEVEC_SIZE];
 	struct folio *folio;
+	bool same_folio;
 	long nr_swaps_freed = 0;
 	pgoff_t index;
 	int i;
-	bool partial_end;
 
 	if (lend == -1)
 		end = -1;	/* unsigned, so actually very big */
@@ -943,14 +952,10 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 		index++;
 	}
 
-	partial_end = ((lend + 1) % PAGE_SIZE) != 0;
-	shmem_get_folio(inode, lstart >> PAGE_SHIFT, &folio, SGP_READ);
+	same_folio = (lstart >> PAGE_SHIFT) == (lend >> PAGE_SHIFT);
+	folio = shmem_get_partial_folio(inode, lstart >> PAGE_SHIFT);
 	if (folio) {
-		bool same_folio;
-
 		same_folio = lend < folio_pos(folio) + folio_size(folio);
-		if (same_folio)
-			partial_end = false;
 		folio_mark_dirty(folio);
 		if (!truncate_inode_partial_folio(folio, lstart, lend)) {
 			start = folio->index + folio_nr_pages(folio);
@@ -962,8 +967,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 		folio = NULL;
 	}
 
-	if (partial_end)
-		shmem_get_folio(inode, end, &folio, SGP_READ);
+	if (!same_folio)
+		folio = shmem_get_partial_folio(inode, lend >> PAGE_SHIFT);
 	if (folio) {
 		folio_mark_dirty(folio);
 		if (!truncate_inode_partial_folio(folio, lstart, lend))
diff --git a/mm/truncate.c b/mm/truncate.c
index 749aac71fda5..5c87cdc70e7b 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -351,7 +351,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 	pgoff_t		index;
 	int		i;
 	struct folio	*folio;
-	bool		partial_end;
+	bool		same_folio;
 
 	if (mapping_empty(mapping))
 		goto out;
@@ -388,12 +388,10 @@ void truncate_inode_pages_range(struct address_space *mapping,
 		cond_resched();
 	}
 
-	partial_end = ((lend + 1) % PAGE_SIZE) != 0;
+	same_folio = (lstart >> PAGE_SHIFT) == (lend >> PAGE_SHIFT);
 	folio = __filemap_get_folio(mapping, lstart >> PAGE_SHIFT, FGP_LOCK, 0);
 	if (folio) {
-		bool same_folio = lend < folio_pos(folio) + folio_size(folio);
-		if (same_folio)
-			partial_end = false;
+		same_folio = lend < folio_pos(folio) + folio_size(folio);
 		if (!truncate_inode_partial_folio(folio, lstart, lend)) {
 			start = folio->index + folio_nr_pages(folio);
 			if (same_folio)
@@ -404,8 +402,9 @@ void truncate_inode_pages_range(struct address_space *mapping,
 		folio = NULL;
 	}
 
-	if (partial_end)
-		folio = __filemap_get_folio(mapping, end, FGP_LOCK, 0);
+	if (!same_folio)
+		folio = __filemap_get_folio(mapping, lend >> PAGE_SHIFT,
+						FGP_LOCK, 0);
 	if (folio) {
 		if (!truncate_inode_partial_folio(folio, lstart, lend))
 			end = folio->index;

Hugh Dickins Jan. 8, 2022, 4:47 p.m. UTC | #10

On Sat, 8 Jan 2022, Matthew Wilcox wrote:
> On Sun, Jan 02, 2022 at 04:19:41PM +0000, Matthew Wilcox wrote:
> > On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote:
> > > This all passes xfstests with no new failures on both xfs and tmpfs.
> > > I intend to put all this into for-next tomorrow.
> > 
> > As a result of Christoph's review, here's the diff.  I don't
> > think it's worth re-posting the entire patch series.
> 
> After further review and integrating Hugh's fixes, here's what
> I've just updated the for-next tree with.  A little late, but that's
> this time of year ...

I don't see any fix to shmem_add_to_page_cache() in this diff, my 3/3
shmem: Fix "Unused swap" messages - I'm not sure whether you decided
my fix has to be adjusted or not, but some fix is needed there.

Hugh

Matthew Wilcox Jan. 8, 2022, 4:53 p.m. UTC | #11

On Sat, Jan 08, 2022 at 08:47:49AM -0800, Hugh Dickins wrote:
> On Sat, 8 Jan 2022, Matthew Wilcox wrote:
> > On Sun, Jan 02, 2022 at 04:19:41PM +0000, Matthew Wilcox wrote:
> > > On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote:
> > > > This all passes xfstests with no new failures on both xfs and tmpfs.
> > > > I intend to put all this into for-next tomorrow.
> > > 
> > > As a result of Christoph's review, here's the diff.  I don't
> > > think it's worth re-posting the entire patch series.
> > 
> > After further review and integrating Hugh's fixes, here's what
> > I've just updated the for-next tree with.  A little late, but that's
> > this time of year ...
> 
> I don't see any fix to shmem_add_to_page_cache() in this diff, my 3/3
> shmem: Fix "Unused swap" messages - I'm not sure whether you decided
> my fix has to be adjusted or not, but some fix is needed there.

I pushed that earlier because I had more confidence in my understanding
of that patch.  Here's what's currently in for-next:

@@ -721,20 +720,18 @@ static int shmem_add_to_page_cache(struct page *page,
        cgroup_throttle_swaprate(page, gfp);

        do {
-               void *entry;
                xas_lock_irq(&xas);
-               entry = xas_find_conflict(&xas);
-               if (entry != expected)
+               if (expected != xas_find_conflict(&xas)) {
+                       xas_set_err(&xas, -EEXIST);
+                       goto unlock;
+               }
+               if (expected && xas_find_conflict(&xas)) {
                        xas_set_err(&xas, -EEXIST);
-               xas_create_range(&xas);
-               if (xas_error(&xas))
                        goto unlock;
-next:
-               xas_store(&xas, page);
-               if (++i < nr) {
-                       xas_next(&xas);
-                       goto next;
                }
+               xas_store(&xas, page);
+               if (xas_error(&xas))
+                       goto unlock;
                if (PageTransHuge(page)) {
                        count_vm_event(THP_FILE_ALLOC);
                        __mod_lruvec_page_state(page, NR_SHMEM_THPS, nr);

Hugh Dickins Jan. 8, 2022, 5:20 p.m. UTC | #12

On Sat, 8 Jan 2022, Matthew Wilcox wrote:
> On Sat, Jan 08, 2022 at 08:47:49AM -0800, Hugh Dickins wrote:
> > On Sat, 8 Jan 2022, Matthew Wilcox wrote:
> > > On Sun, Jan 02, 2022 at 04:19:41PM +0000, Matthew Wilcox wrote:
> > > > On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote:
> > > > > This all passes xfstests with no new failures on both xfs and tmpfs.
> > > > > I intend to put all this into for-next tomorrow.
> > > > 
> > > > As a result of Christoph's review, here's the diff.  I don't
> > > > think it's worth re-posting the entire patch series.
> > > 
> > > After further review and integrating Hugh's fixes, here's what
> > > I've just updated the for-next tree with.  A little late, but that's
> > > this time of year ...
> > 
> > I don't see any fix to shmem_add_to_page_cache() in this diff, my 3/3
> > shmem: Fix "Unused swap" messages - I'm not sure whether you decided
> > my fix has to be adjusted or not, but some fix is needed there.
> 
> I pushed that earlier because I had more confidence in my understanding
> of that patch.  Here's what's currently in for-next:

Okay, thanks: I tried that variant when you proposed it, and it worked fine.

Hugh

[00/48] Folios for 5.17

Message

Comments