Message ID | 20211208042256.1923824-1-willy@infradead.org (mailing list archive) |
---|---|
Headers | show |
Series | Folios for 5.17 | expand |
Consolidated multiple review comments into one email, the majority are nits at best: [PATCH 04/48]: An obnoxiously pendantic English grammar nit: + * lock). This can also be called from mark_buffer_dirty(), which I The period should be inside the paren, e.g.: "lock.)" [PATCH 05/48]: + unsigned char aux[3]; I'd like to see an explanation of why this is "3." +static inline void folio_batch_init(struct folio_batch *fbatch) +{ + fbatch->nr = 0; +} + +static inline unsigned int folio_batch_count(struct folio_batch *fbatch) +{ + return fbatch->nr; +} + +static inline unsigned int fbatch_space(struct folio_batch *fbatch) +{ + return PAGEVEC_SIZE - fbatch->nr; +} + +/** + * folio_batch_add() - Add a folio to a batch. + * @fbatch: The folio batch. + * @folio: The folio to add. + * + * The folio is added to the end of the batch. + * The batch must have previously been initialised using folio_batch_init(). + * + * Return: The number of slots still available. + */ +static inline unsigned folio_batch_add(struct folio_batch *fbatch, + struct folio *folio) +{ + fbatch->folios[fbatch->nr++] = folio; Is there any need to validate fbatch in these inlines? [PATCH 07/48]: + xas_for_each(&xas, folio, ULONG_MAX) { \ unsigned left; \ - if (xas_retry(&xas, head)) \ + size_t offset = offset_in_folio(folio, start + __off); \ + if (xas_retry(&xas, folio)) \ continue; \ - if (WARN_ON(xa_is_value(head))) \ + if (WARN_ON(xa_is_value(folio))) \ break; \ - if (WARN_ON(PageHuge(head))) \ + if (WARN_ON(folio_test_hugetlb(folio))) \ break; \ - for (j = (head->index < index) ? index - head->index : 0; \ - j < thp_nr_pages(head); j++) { \ - void *kaddr = kmap_local_page(head + j); \ - base = kaddr + offset; \ - len = PAGE_SIZE - offset; \ + while (offset < folio_size(folio)) { \ Since offset is not actually used until after a bunch of error conditions may exit or restart the loop, and isn't used at all in between, defer its calculation until just before its first use in the "while." [PATCH 25/48]: Whether you use your follow-on patch sent to Christop or defer it until later as mentioned in your followup email, either approach is fine with me. Otherwise it all looks good. For the series: Reviewed-by: William Kucharski <william.kucharski@oracle.com> > On Dec 7, 2021, at 9:22 PM, Matthew Wilcox (Oracle) <willy@infradead.org> wrote: > > I was trying to get all the way to adding large folios to the page cache, > but I ran out of time. I think the changes in this batch of patches > are worth adding, even without large folio support for filesystems other > than tmpfs. > > The big change here is the last patch which switches from storing > large folios in the page cache as 2^N identical entries to using the > XArray support for multi-index entries. As the changelog says, this > fixes a bug that can occur when doing (eg) msync() or sync_file_range(). > > Some parts of this have been sent before. The first patch is already > in Andrew's tree, but I included it here so the build bots don't freak > out about not being able to apply this patch series. The folio_batch > (replacement for pagevec) is new. I also fixed a few bugs. > > This all passes xfstests with no new failures on both xfs and tmpfs. > I intend to put all this into for-next tomorrow. > > Matthew Wilcox (Oracle) (48): > filemap: Remove PageHWPoison check from next_uptodate_page() > fs/writeback: Convert inode_switch_wbs_work_fn to folios > mm/doc: Add documentation for folio_test_uptodate > mm/writeback: Improve __folio_mark_dirty() comment > pagevec: Add folio_batch > iov_iter: Add copy_folio_to_iter() > iov_iter: Convert iter_xarray to use folios > mm: Add folio_test_pmd_mappable() > filemap: Add folio_put_wait_locked() > filemap: Convert page_cache_delete to take a folio > filemap: Add filemap_unaccount_folio() > filemap: Convert tracing of page cache operations to folio > filemap: Add filemap_remove_folio and __filemap_remove_folio > filemap: Convert find_get_entry to return a folio > filemap: Remove thp_contains() > filemap: Convert filemap_get_read_batch to use folios > filemap: Convert find_get_pages_contig to folios > filemap: Convert filemap_read_page to take a folio > filemap: Convert filemap_create_page to folio > filemap: Convert filemap_range_uptodate to folios > readahead: Convert page_cache_async_ra() to take a folio > readahead: Convert page_cache_ra_unbounded to folios > filemap: Convert do_async_mmap_readahead to take a folio > filemap: Convert filemap_fault to folio > filemap: Add read_cache_folio and read_mapping_folio > filemap: Convert filemap_get_pages to use folios > filemap: Convert page_cache_delete_batch to folios > filemap: Use folios in next_uptodate_page > filemap: Use a folio in filemap_map_pages > filemap: Use a folio in filemap_page_mkwrite > filemap: Add filemap_release_folio() > truncate: Add truncate_cleanup_folio() > mm: Add unmap_mapping_folio() > shmem: Convert part of shmem_undo_range() to use a folio > truncate,shmem: Add truncate_inode_folio() > truncate: Skip known-truncated indices > truncate: Convert invalidate_inode_pages2_range() to use a folio > truncate: Add invalidate_complete_folio2() > filemap: Convert filemap_read() to use a folio > filemap: Convert filemap_get_read_batch() to use a folio_batch > filemap: Return only folios from find_get_entries() > mm: Convert find_lock_entries() to use a folio_batch > mm: Remove pagevec_remove_exceptionals() > fs: Convert vfs_dedupe_file_range_compare to folios > truncate: Convert invalidate_inode_pages2_range to folios > truncate,shmem: Handle truncates that split large folios > XArray: Add xas_advance() > mm: Use multi-index entries in the page cache > > fs/fs-writeback.c | 24 +- > fs/remap_range.c | 116 ++-- > include/linux/huge_mm.h | 14 + > include/linux/mm.h | 68 +-- > include/linux/page-flags.h | 13 +- > include/linux/pagemap.h | 59 +- > include/linux/pagevec.h | 61 ++- > include/linux/uio.h | 7 + > include/linux/xarray.h | 18 + > include/trace/events/filemap.h | 32 +- > lib/iov_iter.c | 29 +- > lib/xarray.c | 6 +- > mm/filemap.c | 967 ++++++++++++++++----------------- > mm/folio-compat.c | 11 + > mm/huge_memory.c | 20 +- > mm/internal.h | 35 +- > mm/khugepaged.c | 12 +- > mm/memory.c | 27 +- > mm/migrate.c | 29 +- > mm/page-writeback.c | 6 +- > mm/readahead.c | 24 +- > mm/shmem.c | 174 +++--- > mm/swap.c | 26 +- > mm/truncate.c | 305 ++++++----- > 24 files changed, 1100 insertions(+), 983 deletions(-) > > -- > 2.33.0 > >
On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote: > This all passes xfstests with no new failures on both xfs and tmpfs. > I intend to put all this into for-next tomorrow. As a result of Christoph's review, here's the diff. I don't think it's worth re-posting the entire patch series. diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h index 7d3494f7fb70..dda8d5868c81 100644 --- a/include/linux/pagevec.h +++ b/include/linux/pagevec.h @@ -18,6 +18,7 @@ struct page; struct folio; struct address_space; +/* Layout must match folio_batch */ struct pagevec { unsigned char nr; bool percpu_pvec_drained; @@ -85,17 +86,22 @@ static inline void pagevec_release(struct pagevec *pvec) * struct folio_batch - A collection of folios. * * The folio_batch is used to amortise the cost of retrieving and - * operating on a set of folios. The order of folios in the batch is - * not considered important. Some users of the folio_batch store - * "exceptional" entries in it which can be removed by calling - * folio_batch_remove_exceptionals(). + * operating on a set of folios. The order of folios in the batch may be + * significant (eg delete_from_page_cache_batch()). Some users of the + * folio_batch store "exceptional" entries in it which can be removed + * by calling folio_batch_remove_exceptionals(). */ struct folio_batch { unsigned char nr; - unsigned char aux[3]; + bool percpu_pvec_drained; struct folio *folios[PAGEVEC_SIZE]; }; +/* Layout must match pagevec */ +static_assert(sizeof(struct pagevec) == sizeof(struct folio_batch)); +static_assert(offsetof(struct pagevec, pages) == + offsetof(struct folio_batch, folios)); + /** * folio_batch_init() - Initialise a batch of folios * @fbatch: The folio batch. diff --git a/include/linux/uio.h b/include/linux/uio.h index 8479cf46b5b1..781d98d96611 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -150,7 +150,7 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i); static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset, size_t bytes, struct iov_iter *i) { - return copy_page_to_iter((struct page *)folio, offset, bytes, i); + return copy_page_to_iter(&folio->page, offset, bytes, i); } static __always_inline __must_check diff --git a/mm/filemap.c b/mm/filemap.c index 9b5b2d962c37..33077c264d79 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3451,45 +3451,12 @@ static struct folio *do_read_cache_folio(struct address_space *mapping, if (folio_test_uptodate(folio)) goto out; - /* - * Page is not up to date and may be locked due to one of the following - * case a: Page is being filled and the page lock is held - * case b: Read/write error clearing the page uptodate status - * case c: Truncation in progress (page locked) - * case d: Reclaim in progress - * - * Case a, the page will be up to date when the page is unlocked. - * There is no need to serialise on the page lock here as the page - * is pinned so the lock gives no additional protection. Even if the - * page is truncated, the data is still valid if PageUptodate as - * it's a race vs truncate race. - * Case b, the page will not be up to date - * Case c, the page may be truncated but in itself, the data may still - * be valid after IO completes as it's a read vs truncate race. The - * operation must restart if the page is not uptodate on unlock but - * otherwise serialising on page lock to stabilise the mapping gives - * no additional guarantees to the caller as the page lock is - * released before return. - * Case d, similar to truncation. If reclaim holds the page lock, it - * will be a race with remove_mapping that determines if the mapping - * is valid on unlock but otherwise the data is valid and there is - * no need to serialise with page lock. - * - * As the page lock gives no additional guarantee, we optimistically - * wait on the page to be unlocked and check if it's up to date and - * use the page if it is. Otherwise, the page lock is required to - * distinguish between the different cases. The motivation is that we - * avoid spurious serialisations and wakeups when multiple processes - * wait on the same page for IO to complete. - */ - folio_wait_locked(folio); - if (folio_test_uptodate(folio)) - goto out; - - /* Distinguish between all the cases under the safety of the lock */ - folio_lock(folio); + if (!folio_trylock(folio)) { + folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE); + goto repeat; + } - /* Case c or d, restart the operation */ + /* Folio was truncated from mapping */ if (!folio->mapping) { folio_unlock(folio); folio_put(folio); @@ -3543,7 +3510,9 @@ EXPORT_SYMBOL(read_cache_folio); static struct page *do_read_cache_page(struct address_space *mapping, pgoff_t index, filler_t *filler, void *data, gfp_t gfp) { - struct folio *folio = read_cache_folio(mapping, index, filler, data); + struct folio *folio; + + folio = do_read_cache_folio(mapping, index, filler, data, gfp); if (IS_ERR(folio)) return &folio->page; return folio_file_page(folio, index); diff --git a/mm/shmem.c b/mm/shmem.c index 735404fdfcc3..6cd3af0addc1 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -942,18 +942,18 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, index++; } - partial_end = ((lend + 1) % PAGE_SIZE) > 0; + partial_end = ((lend + 1) % PAGE_SIZE) != 0; shmem_get_folio(inode, lstart >> PAGE_SHIFT, &folio, SGP_READ); if (folio) { - bool same_page; + bool same_folio; - same_page = lend < folio_pos(folio) + folio_size(folio); - if (same_page) + same_folio = lend < folio_pos(folio) + folio_size(folio); + if (same_folio) partial_end = false; folio_mark_dirty(folio); if (!truncate_inode_partial_folio(folio, lstart, lend)) { start = folio->index + folio_nr_pages(folio); - if (same_page) + if (same_folio) end = folio->index; } folio_unlock(folio); diff --git a/mm/truncate.c b/mm/truncate.c index 336c8d099efa..749aac71fda5 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -350,8 +350,8 @@ void truncate_inode_pages_range(struct address_space *mapping, pgoff_t indices[PAGEVEC_SIZE]; pgoff_t index; int i; - struct folio * folio; - bool partial_end; + struct folio *folio; + bool partial_end; if (mapping_empty(mapping)) goto out; @@ -388,7 +388,7 @@ void truncate_inode_pages_range(struct address_space *mapping, cond_resched(); } - partial_end = ((lend + 1) % PAGE_SIZE) > 0; + partial_end = ((lend + 1) % PAGE_SIZE) != 0; folio = __filemap_get_folio(mapping, lstart >> PAGE_SHIFT, FGP_LOCK, 0); if (folio) { bool same_folio = lend < folio_pos(folio) + folio_size(folio);
Looks good, addresses one of my comments as well. Again: Reviewed-by: William Kucharski <william.kucharski@oracle.com> > On Jan 2, 2022, at 9:19 AM, Matthew Wilcox <willy@infradead.org> wrote: > > On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote: >> This all passes xfstests with no new failures on both xfs and tmpfs. >> I intend to put all this into for-next tomorrow. > > As a result of Christoph's review, here's the diff. I don't > think it's worth re-posting the entire patch series. > > > diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h > index 7d3494f7fb70..dda8d5868c81 100644 > --- a/include/linux/pagevec.h > +++ b/include/linux/pagevec.h > @@ -18,6 +18,7 @@ struct page; > struct folio; > struct address_space; > > +/* Layout must match folio_batch */ > struct pagevec { > unsigned char nr; > bool percpu_pvec_drained; > @@ -85,17 +86,22 @@ static inline void pagevec_release(struct pagevec *pvec) > * struct folio_batch - A collection of folios. > * > * The folio_batch is used to amortise the cost of retrieving and > - * operating on a set of folios. The order of folios in the batch is > - * not considered important. Some users of the folio_batch store > - * "exceptional" entries in it which can be removed by calling > - * folio_batch_remove_exceptionals(). > + * operating on a set of folios. The order of folios in the batch may be > + * significant (eg delete_from_page_cache_batch()). Some users of the > + * folio_batch store "exceptional" entries in it which can be removed > + * by calling folio_batch_remove_exceptionals(). > */ > struct folio_batch { > unsigned char nr; > - unsigned char aux[3]; > + bool percpu_pvec_drained; > struct folio *folios[PAGEVEC_SIZE]; > }; > > +/* Layout must match pagevec */ > +static_assert(sizeof(struct pagevec) == sizeof(struct folio_batch)); > +static_assert(offsetof(struct pagevec, pages) == > + offsetof(struct folio_batch, folios)); > + > /** > * folio_batch_init() - Initialise a batch of folios > * @fbatch: The folio batch. > diff --git a/include/linux/uio.h b/include/linux/uio.h > index 8479cf46b5b1..781d98d96611 100644 > --- a/include/linux/uio.h > +++ b/include/linux/uio.h > @@ -150,7 +150,7 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i); > static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset, > size_t bytes, struct iov_iter *i) > { > - return copy_page_to_iter((struct page *)folio, offset, bytes, i); > + return copy_page_to_iter(&folio->page, offset, bytes, i); > } > > static __always_inline __must_check > diff --git a/mm/filemap.c b/mm/filemap.c > index 9b5b2d962c37..33077c264d79 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3451,45 +3451,12 @@ static struct folio *do_read_cache_folio(struct address_space *mapping, > if (folio_test_uptodate(folio)) > goto out; > > - /* > - * Page is not up to date and may be locked due to one of the following > - * case a: Page is being filled and the page lock is held > - * case b: Read/write error clearing the page uptodate status > - * case c: Truncation in progress (page locked) > - * case d: Reclaim in progress > - * > - * Case a, the page will be up to date when the page is unlocked. > - * There is no need to serialise on the page lock here as the page > - * is pinned so the lock gives no additional protection. Even if the > - * page is truncated, the data is still valid if PageUptodate as > - * it's a race vs truncate race. > - * Case b, the page will not be up to date > - * Case c, the page may be truncated but in itself, the data may still > - * be valid after IO completes as it's a read vs truncate race. The > - * operation must restart if the page is not uptodate on unlock but > - * otherwise serialising on page lock to stabilise the mapping gives > - * no additional guarantees to the caller as the page lock is > - * released before return. > - * Case d, similar to truncation. If reclaim holds the page lock, it > - * will be a race with remove_mapping that determines if the mapping > - * is valid on unlock but otherwise the data is valid and there is > - * no need to serialise with page lock. > - * > - * As the page lock gives no additional guarantee, we optimistically > - * wait on the page to be unlocked and check if it's up to date and > - * use the page if it is. Otherwise, the page lock is required to > - * distinguish between the different cases. The motivation is that we > - * avoid spurious serialisations and wakeups when multiple processes > - * wait on the same page for IO to complete. > - */ > - folio_wait_locked(folio); > - if (folio_test_uptodate(folio)) > - goto out; > - > - /* Distinguish between all the cases under the safety of the lock */ > - folio_lock(folio); > + if (!folio_trylock(folio)) { > + folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE); > + goto repeat; > + } > > - /* Case c or d, restart the operation */ > + /* Folio was truncated from mapping */ > if (!folio->mapping) { > folio_unlock(folio); > folio_put(folio); > @@ -3543,7 +3510,9 @@ EXPORT_SYMBOL(read_cache_folio); > static struct page *do_read_cache_page(struct address_space *mapping, > pgoff_t index, filler_t *filler, void *data, gfp_t gfp) > { > - struct folio *folio = read_cache_folio(mapping, index, filler, data); > + struct folio *folio; > + > + folio = do_read_cache_folio(mapping, index, filler, data, gfp); > if (IS_ERR(folio)) > return &folio->page; > return folio_file_page(folio, index); > diff --git a/mm/shmem.c b/mm/shmem.c > index 735404fdfcc3..6cd3af0addc1 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -942,18 +942,18 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, > index++; > } > > - partial_end = ((lend + 1) % PAGE_SIZE) > 0; > + partial_end = ((lend + 1) % PAGE_SIZE) != 0; > shmem_get_folio(inode, lstart >> PAGE_SHIFT, &folio, SGP_READ); > if (folio) { > - bool same_page; > + bool same_folio; > > - same_page = lend < folio_pos(folio) + folio_size(folio); > - if (same_page) > + same_folio = lend < folio_pos(folio) + folio_size(folio); > + if (same_folio) > partial_end = false; > folio_mark_dirty(folio); > if (!truncate_inode_partial_folio(folio, lstart, lend)) { > start = folio->index + folio_nr_pages(folio); > - if (same_page) > + if (same_folio) > end = folio->index; > } > folio_unlock(folio); > diff --git a/mm/truncate.c b/mm/truncate.c > index 336c8d099efa..749aac71fda5 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -350,8 +350,8 @@ void truncate_inode_pages_range(struct address_space *mapping, > pgoff_t indices[PAGEVEC_SIZE]; > pgoff_t index; > int i; > - struct folio * folio; > - bool partial_end; > + struct folio *folio; > + bool partial_end; > > if (mapping_empty(mapping)) > goto out; > @@ -388,7 +388,7 @@ void truncate_inode_pages_range(struct address_space *mapping, > cond_resched(); > } > > - partial_end = ((lend + 1) % PAGE_SIZE) > 0; > + partial_end = ((lend + 1) % PAGE_SIZE) != 0; > folio = __filemap_get_folio(mapping, lstart >> PAGE_SHIFT, FGP_LOCK, 0); > if (folio) { > bool same_folio = lend < folio_pos(folio) + folio_size(folio); >
Sorry I missed this while travelling. On Sun, Dec 26, 2021 at 10:26:23PM +0000, William Kucharski wrote: > Consolidated multiple review comments into one email, the majority are nits at > best: > > [PATCH 04/48]: > > An obnoxiously pendantic English grammar nit: > > + * lock). This can also be called from mark_buffer_dirty(), which I > > The period should be inside the paren, e.g.: "lock.)" That's at least debatable. The full context here is: [...] A few have the folio blocked from truncation through other + * means (eg zap_page_range() has it mapped and is holding the page table + * lock). According to AP Style, the period goes outside the paren in this case: https://blog.apastyle.org/apastyle/2013/03/punctuation-junction-periods-and-parentheses.html I'm sure you can find an authority to support always placing a period inside a paren, but we don't have a controlling authority for how to punctuate our documentation. I'm great fun at parties when I get going on the subject of the Oxford comma. > [PATCH 05/48]: > > + unsigned char aux[3]; > > I'd like to see an explanation of why this is "3." I got rid of it ... for now ;-) > +static inline void folio_batch_init(struct folio_batch *fbatch) > +{ > + fbatch->nr = 0; > +} > + > +static inline unsigned int folio_batch_count(struct folio_batch *fbatch) > +{ > + return fbatch->nr; > +} > + > +static inline unsigned int fbatch_space(struct folio_batch *fbatch) > +{ > + return PAGEVEC_SIZE - fbatch->nr; > +} > + > +/** > + * folio_batch_add() - Add a folio to a batch. > + * @fbatch: The folio batch. > + * @folio: The folio to add. > + * > + * The folio is added to the end of the batch. > + * The batch must have previously been initialised using folio_batch_init(). > + * > + * Return: The number of slots still available. > + */ > +static inline unsigned folio_batch_add(struct folio_batch *fbatch, > + struct folio *folio) > +{ > + fbatch->folios[fbatch->nr++] = folio; > > Is there any need to validate fbatch in these inlines? I don't think so? At least, there's no validation for the pagevec equivalents. I'd be open to adding something cheap if it's likely to catch a bug someone's likely to introduce. > [PATCH 07/48]: > > + xas_for_each(&xas, folio, ULONG_MAX) { \ > unsigned left; \ > - if (xas_retry(&xas, head)) \ > + size_t offset = offset_in_folio(folio, start + __off); \ > + if (xas_retry(&xas, folio)) \ > continue; \ > - if (WARN_ON(xa_is_value(head))) \ > + if (WARN_ON(xa_is_value(folio))) \ > break; \ > - if (WARN_ON(PageHuge(head))) \ > + if (WARN_ON(folio_test_hugetlb(folio))) \ > break; \ > - for (j = (head->index < index) ? index - head->index : 0; \ > - j < thp_nr_pages(head); j++) { \ > - void *kaddr = kmap_local_page(head + j); \ > - base = kaddr + offset; \ > - len = PAGE_SIZE - offset; \ > + while (offset < folio_size(folio)) { \ > > Since offset is not actually used until after a bunch of error conditions > may exit or restart the loop, and isn't used at all in between, defer > its calculation until just before its first use in the "while." Hmm. Those conditions aren't likely to occur, but ... now that you mention it, checking xa_is_value() after using folio as if it's not a value is Wrong. So I'm going to fold in this: @@ -78,13 +78,14 @@ rcu_read_lock(); \ xas_for_each(&xas, folio, ULONG_MAX) { \ unsigned left; \ - size_t offset = offset_in_folio(folio, start + __off); \ + size_t offset; \ if (xas_retry(&xas, folio)) \ continue; \ if (WARN_ON(xa_is_value(folio))) \ break; \ if (WARN_ON(folio_test_hugetlb(folio))) \ break; \ + offset = offset_in_folio(folio, start + __off); \ while (offset < folio_size(folio)) { \ base = kmap_local_folio(folio, offset); \ len = min(n, len); \ > Reviewed-by: William Kucharski <william.kucharski@oracle.com> Thanks. I'll go through and add that in, then push again.
On Sun, 2 Jan 2022, Matthew Wilcox wrote: > On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote: > > This all passes xfstests with no new failures on both xfs and tmpfs. > > I intend to put all this into for-next tomorrow. > > As a result of Christoph's review, here's the diff. I don't > think it's worth re-posting the entire patch series. Yes, please don't re-post. I've just spent a few days testing and fixing the shmem end of it (as I did in Nov 2020 - but I've kept closer to your intent this time), three patches follow: mm/shmem.c | 58 ++++++++++++++++++++++++++------------------------ mm/truncate.c | 15 ++++++------ 2 files changed, 38 insertions(+), 35 deletions(-) Those patches are against next-20211224, not based on your latest diff; but I think your shmem.c and truncate.c updates can just be replaced by mine (yes, I too changed "same_page" to "same_folio"). Hugh
On Sun, Jan 02, 2022 at 04:19:41PM +0000, Matthew Wilcox wrote: > +++ b/include/linux/uio.h > @@ -150,7 +150,7 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i); > static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset, > size_t bytes, struct iov_iter *i) > { > - return copy_page_to_iter((struct page *)folio, offset, bytes, i); > + return copy_page_to_iter(&folio->page, offset, bytes, i); > } Mmmpf. I should have tested this more thoroughly. Some files currently include uio.h without including mm_types.h. So this on top fixes the build: +++ b/include/linux/uio.h @@ -7,10 +7,10 @@ #include <linux/kernel.h> #include <linux/thread_info.h> +#include <linux/mm_types.h> #include <uapi/linux/uio.h> struct page; -struct folio; struct pipe_inode_info; struct kvec {
On Sun, Jan 02, 2022 at 04:19:41PM +0000, Matthew Wilcox wrote: > On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote: > > This all passes xfstests with no new failures on both xfs and tmpfs. > > I intend to put all this into for-next tomorrow. > > As a result of Christoph's review, here's the diff. I don't > think it's worth re-posting the entire patch series. Thanks, the changes look good to me.
Thanks. I like your changes and will defer to you on the period and Oxford comma. :-) > On Jan 2, 2022, at 18:27, Matthew Wilcox <willy@infradead.org> wrote: > > Sorry I missed this while travelling. > >> On Sun, Dec 26, 2021 at 10:26:23PM +0000, William Kucharski wrote: >> Consolidated multiple review comments into one email, the majority are nits at >> best: >> >> [PATCH 04/48]: >> >> An obnoxiously pendantic English grammar nit: >> >> + * lock). This can also be called from mark_buffer_dirty(), which I >> >> The period should be inside the paren, e.g.: "lock.)" > > That's at least debatable. The full context here is: > > [...] A few have the folio blocked from truncation through other > + * means (eg zap_page_range() has it mapped and is holding the page table > + * lock). > > According to AP Style, the period goes outside the paren in this case: > https://blog.apastyle.org/apastyle/2013/03/punctuation-junction-periods-and-parentheses.html > > I'm sure you can find an authority to support always placing a period > inside a paren, but we don't have a controlling authority for how to > punctuate our documentation. I'm great fun at parties when I get going > on the subject of the Oxford comma. > >> [PATCH 05/48]: >> >> + unsigned char aux[3]; >> >> I'd like to see an explanation of why this is "3." > > I got rid of it ... for now ;-) > >> +static inline void folio_batch_init(struct folio_batch *fbatch) >> +{ >> + fbatch->nr = 0; >> +} >> + >> +static inline unsigned int folio_batch_count(struct folio_batch *fbatch) >> +{ >> + return fbatch->nr; >> +} >> + >> +static inline unsigned int fbatch_space(struct folio_batch *fbatch) >> +{ >> + return PAGEVEC_SIZE - fbatch->nr; >> +} >> + >> +/** >> + * folio_batch_add() - Add a folio to a batch. >> + * @fbatch: The folio batch. >> + * @folio: The folio to add. >> + * >> + * The folio is added to the end of the batch. >> + * The batch must have previously been initialised using folio_batch_init(). >> + * >> + * Return: The number of slots still available. >> + */ >> +static inline unsigned folio_batch_add(struct folio_batch *fbatch, >> + struct folio *folio) >> +{ >> + fbatch->folios[fbatch->nr++] = folio; >> >> Is there any need to validate fbatch in these inlines? > > I don't think so? At least, there's no validation for the pagevec > equivalents. I'd be open to adding something cheap if it's likely to > catch a bug someone's likely to introduce. > >> [PATCH 07/48]: >> >> + xas_for_each(&xas, folio, ULONG_MAX) { \ >> unsigned left; \ >> - if (xas_retry(&xas, head)) \ >> + size_t offset = offset_in_folio(folio, start + __off); \ >> + if (xas_retry(&xas, folio)) \ >> continue; \ >> - if (WARN_ON(xa_is_value(head))) \ >> + if (WARN_ON(xa_is_value(folio))) \ >> break; \ >> - if (WARN_ON(PageHuge(head))) \ >> + if (WARN_ON(folio_test_hugetlb(folio))) \ >> break; \ >> - for (j = (head->index < index) ? index - head->index : 0; \ >> - j < thp_nr_pages(head); j++) { \ >> - void *kaddr = kmap_local_page(head + j); \ >> - base = kaddr + offset; \ >> - len = PAGE_SIZE - offset; \ >> + while (offset < folio_size(folio)) { \ >> >> Since offset is not actually used until after a bunch of error conditions >> may exit or restart the loop, and isn't used at all in between, defer >> its calculation until just before its first use in the "while." > > Hmm. Those conditions aren't likely to occur, but ... now that you > mention it, checking xa_is_value() after using folio as if it's not a > value is Wrong. So I'm going to fold in this: > > @@ -78,13 +78,14 @@ > rcu_read_lock(); \ > xas_for_each(&xas, folio, ULONG_MAX) { \ > unsigned left; \ > - size_t offset = offset_in_folio(folio, start + __off); \ > + size_t offset; \ > if (xas_retry(&xas, folio)) \ > continue; \ > if (WARN_ON(xa_is_value(folio))) \ > break; \ > if (WARN_ON(folio_test_hugetlb(folio))) \ > break; \ > + offset = offset_in_folio(folio, start + __off); \ > while (offset < folio_size(folio)) { \ > base = kmap_local_folio(folio, offset); \ > len = min(n, len); \ > >> Reviewed-by: William Kucharski <william.kucharski@oracle.com> > > Thanks. I'll go through and add that in, then push again. >
On Sun, Jan 02, 2022 at 04:19:41PM +0000, Matthew Wilcox wrote: > On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote: > > This all passes xfstests with no new failures on both xfs and tmpfs. > > I intend to put all this into for-next tomorrow. > > As a result of Christoph's review, here's the diff. I don't > think it's worth re-posting the entire patch series. After further review and integrating Hugh's fixes, here's what I've just updated the for-next tree with. A little late, but that's this time of year ... diff --git a/mm/internal.h b/mm/internal.h index e989d8ceec91..26af8a5a5be3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -76,28 +76,7 @@ static inline bool can_madv_lru_vma(struct vm_area_struct *vma) return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)); } -/* - * Parameter block passed down to zap_pte_range in exceptional cases. - */ -struct zap_details { - struct address_space *zap_mapping; /* Check page->mapping if set */ - struct folio *single_folio; /* Locked folio to be unmapped */ -}; - -/* - * We set details->zap_mappings when we want to unmap shared but keep private - * pages. Return true if skip zapping this page, false otherwise. - */ -static inline bool -zap_skip_check_mapping(struct zap_details *details, struct page *page) -{ - if (!details || !page) - return false; - - return details->zap_mapping && - (details->zap_mapping != page_rmapping(page)); -} - +struct zap_details; void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long addr, unsigned long end, diff --git a/mm/memory.c b/mm/memory.c index a86027026f2a..23f2f1300d42 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1304,6 +1304,28 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) return ret; } +/* + * Parameter block passed down to zap_pte_range in exceptional cases. + */ +struct zap_details { + struct address_space *zap_mapping; /* Check page->mapping if set */ + struct folio *single_folio; /* Locked folio to be unmapped */ +}; + +/* + * We set details->zap_mapping when we want to unmap shared but keep private + * pages. Return true if skip zapping this page, false otherwise. + */ +static inline bool +zap_skip_check_mapping(struct zap_details *details, struct page *page) +{ + if (!details || !page) + return false; + + return details->zap_mapping && + (details->zap_mapping != page_rmapping(page)); +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, diff --git a/mm/shmem.c b/mm/shmem.c index 637de21ff40b..28d627444a24 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -151,19 +151,6 @@ int shmem_getpage(struct inode *inode, pgoff_t index, mapping_gfp_mask(inode->i_mapping), NULL, NULL, NULL); } -static int shmem_get_folio(struct inode *inode, pgoff_t index, - struct folio **foliop, enum sgp_type sgp) -{ - struct page *page = NULL; - int ret = shmem_getpage(inode, index, &page, sgp); - - if (page) - *foliop = page_folio(page); - else - *foliop = NULL; - return ret; -} - static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb) { return sb->s_fs_info; @@ -890,6 +877,28 @@ void shmem_unlock_mapping(struct address_space *mapping) } } +static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index) +{ + struct folio *folio; + struct page *page; + + /* + * At first avoid shmem_getpage(,,,SGP_READ): that fails + * beyond i_size, and reports fallocated pages as holes. + */ + folio = __filemap_get_folio(inode->i_mapping, index, + FGP_ENTRY | FGP_LOCK, 0); + if (!xa_is_value(folio)) + return folio; + /* + * But read a page back from swap if any of it is within i_size + * (although in some cases this is just a waste of time). + */ + page = NULL; + shmem_getpage(inode, index, &page, SGP_READ); + return page ? page_folio(page) : NULL; +} + /* * Remove range of pages and swap entries from page cache, and free them. * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate. @@ -904,10 +913,10 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, struct folio_batch fbatch; pgoff_t indices[PAGEVEC_SIZE]; struct folio *folio; + bool same_folio; long nr_swaps_freed = 0; pgoff_t index; int i; - bool partial_end; if (lend == -1) end = -1; /* unsigned, so actually very big */ @@ -943,14 +952,10 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, index++; } - partial_end = ((lend + 1) % PAGE_SIZE) != 0; - shmem_get_folio(inode, lstart >> PAGE_SHIFT, &folio, SGP_READ); + same_folio = (lstart >> PAGE_SHIFT) == (lend >> PAGE_SHIFT); + folio = shmem_get_partial_folio(inode, lstart >> PAGE_SHIFT); if (folio) { - bool same_folio; - same_folio = lend < folio_pos(folio) + folio_size(folio); - if (same_folio) - partial_end = false; folio_mark_dirty(folio); if (!truncate_inode_partial_folio(folio, lstart, lend)) { start = folio->index + folio_nr_pages(folio); @@ -962,8 +967,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, folio = NULL; } - if (partial_end) - shmem_get_folio(inode, end, &folio, SGP_READ); + if (!same_folio) + folio = shmem_get_partial_folio(inode, lend >> PAGE_SHIFT); if (folio) { folio_mark_dirty(folio); if (!truncate_inode_partial_folio(folio, lstart, lend)) diff --git a/mm/truncate.c b/mm/truncate.c index 749aac71fda5..5c87cdc70e7b 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -351,7 +351,7 @@ void truncate_inode_pages_range(struct address_space *mapping, pgoff_t index; int i; struct folio *folio; - bool partial_end; + bool same_folio; if (mapping_empty(mapping)) goto out; @@ -388,12 +388,10 @@ void truncate_inode_pages_range(struct address_space *mapping, cond_resched(); } - partial_end = ((lend + 1) % PAGE_SIZE) != 0; + same_folio = (lstart >> PAGE_SHIFT) == (lend >> PAGE_SHIFT); folio = __filemap_get_folio(mapping, lstart >> PAGE_SHIFT, FGP_LOCK, 0); if (folio) { - bool same_folio = lend < folio_pos(folio) + folio_size(folio); - if (same_folio) - partial_end = false; + same_folio = lend < folio_pos(folio) + folio_size(folio); if (!truncate_inode_partial_folio(folio, lstart, lend)) { start = folio->index + folio_nr_pages(folio); if (same_folio) @@ -404,8 +402,9 @@ void truncate_inode_pages_range(struct address_space *mapping, folio = NULL; } - if (partial_end) - folio = __filemap_get_folio(mapping, end, FGP_LOCK, 0); + if (!same_folio) + folio = __filemap_get_folio(mapping, lend >> PAGE_SHIFT, + FGP_LOCK, 0); if (folio) { if (!truncate_inode_partial_folio(folio, lstart, lend)) end = folio->index;
On Sat, 8 Jan 2022, Matthew Wilcox wrote: > On Sun, Jan 02, 2022 at 04:19:41PM +0000, Matthew Wilcox wrote: > > On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote: > > > This all passes xfstests with no new failures on both xfs and tmpfs. > > > I intend to put all this into for-next tomorrow. > > > > As a result of Christoph's review, here's the diff. I don't > > think it's worth re-posting the entire patch series. > > After further review and integrating Hugh's fixes, here's what > I've just updated the for-next tree with. A little late, but that's > this time of year ... I don't see any fix to shmem_add_to_page_cache() in this diff, my 3/3 shmem: Fix "Unused swap" messages - I'm not sure whether you decided my fix has to be adjusted or not, but some fix is needed there. Hugh
On Sat, Jan 08, 2022 at 08:47:49AM -0800, Hugh Dickins wrote: > On Sat, 8 Jan 2022, Matthew Wilcox wrote: > > On Sun, Jan 02, 2022 at 04:19:41PM +0000, Matthew Wilcox wrote: > > > On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote: > > > > This all passes xfstests with no new failures on both xfs and tmpfs. > > > > I intend to put all this into for-next tomorrow. > > > > > > As a result of Christoph's review, here's the diff. I don't > > > think it's worth re-posting the entire patch series. > > > > After further review and integrating Hugh's fixes, here's what > > I've just updated the for-next tree with. A little late, but that's > > this time of year ... > > I don't see any fix to shmem_add_to_page_cache() in this diff, my 3/3 > shmem: Fix "Unused swap" messages - I'm not sure whether you decided > my fix has to be adjusted or not, but some fix is needed there. I pushed that earlier because I had more confidence in my understanding of that patch. Here's what's currently in for-next: @@ -721,20 +720,18 @@ static int shmem_add_to_page_cache(struct page *page, cgroup_throttle_swaprate(page, gfp); do { - void *entry; xas_lock_irq(&xas); - entry = xas_find_conflict(&xas); - if (entry != expected) + if (expected != xas_find_conflict(&xas)) { + xas_set_err(&xas, -EEXIST); + goto unlock; + } + if (expected && xas_find_conflict(&xas)) { xas_set_err(&xas, -EEXIST); - xas_create_range(&xas); - if (xas_error(&xas)) goto unlock; -next: - xas_store(&xas, page); - if (++i < nr) { - xas_next(&xas); - goto next; } + xas_store(&xas, page); + if (xas_error(&xas)) + goto unlock; if (PageTransHuge(page)) { count_vm_event(THP_FILE_ALLOC); __mod_lruvec_page_state(page, NR_SHMEM_THPS, nr);
On Sat, 8 Jan 2022, Matthew Wilcox wrote: > On Sat, Jan 08, 2022 at 08:47:49AM -0800, Hugh Dickins wrote: > > On Sat, 8 Jan 2022, Matthew Wilcox wrote: > > > On Sun, Jan 02, 2022 at 04:19:41PM +0000, Matthew Wilcox wrote: > > > > On Wed, Dec 08, 2021 at 04:22:08AM +0000, Matthew Wilcox (Oracle) wrote: > > > > > This all passes xfstests with no new failures on both xfs and tmpfs. > > > > > I intend to put all this into for-next tomorrow. > > > > > > > > As a result of Christoph's review, here's the diff. I don't > > > > think it's worth re-posting the entire patch series. > > > > > > After further review and integrating Hugh's fixes, here's what > > > I've just updated the for-next tree with. A little late, but that's > > > this time of year ... > > > > I don't see any fix to shmem_add_to_page_cache() in this diff, my 3/3 > > shmem: Fix "Unused swap" messages - I'm not sure whether you decided > > my fix has to be adjusted or not, but some fix is needed there. > > I pushed that earlier because I had more confidence in my understanding > of that patch. Here's what's currently in for-next: Okay, thanks: I tried that variant when you proposed it, and it worked fine. Hugh