mbox series

[0/6] shmem: high order folios support in write path

Message ID 20230915095042.1320180-1-da.gomez@samsung.com (mailing list archive)
Headers show
Series shmem: high order folios support in write path | expand

Message

Daniel Gomez Sept. 15, 2023, 9:51 a.m. UTC
This series add support for high order folios in shmem write
path.

This is a continuation of the shmem work from Luis here [1]
following Matthew Wilcox's suggestion [2] regarding the path to take
for the folio allocation order calculation.

[1] RFC v2 add support for blocksize > PAGE_SIZE
https://lore.kernel.org/all/ZHBowMEDfyrAAOWH@bombadil.infradead.org/T/#md3e93ab46ce2ad9254e1eb54ffe71211988b5632
[2] https://lore.kernel.org/all/ZHD9zmIeNXICDaRJ@casper.infradead.org/

Patches have been tested and sent from next-230911. They do apply
cleanly to the latest next-230914.

fsx and fstests has been performed on tmpfs with noswap with the
following results:
- fsx: 2d test, 21,5B
- fstests: Same result as baseline for next-230911 [3][4][5]

[3] Baseline next-230911 failures are: generic/080 generic/126
generic/193 generic/633 generic/689
[4] fstests logs baseline: https://gitlab.com/-/snippets/3598621
[5] fstests logs patches: https://gitlab.com/-/snippets/3598628

There are at least 2 cases/topics to handle that I'd appreciate
feedback.
1. With the new strategy, you might end up with a folio order matching
HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if
THP is enabled.
2. When the above (1.) occurs, the code skips the huge path, so
xa_find with hindex is skipped.

Daniel

Daniel Gomez (5):
  filemap: make the folio order calculation shareable
  shmem: drop BLOCKS_PER_PAGE macro
  shmem: add order parameter support to shmem_alloc_folio
  shmem: add file length in shmem_get_folio path
  shmem: add large folios support to the write path

Luis Chamberlain (1):
  shmem: account for large order folios

 fs/iomap/buffered-io.c   |  6 ++-
 include/linux/pagemap.h  | 42 ++++++++++++++++---
 include/linux/shmem_fs.h |  2 +-
 mm/filemap.c             |  8 ----
 mm/khugepaged.c          |  2 +-
 mm/shmem.c               | 91 +++++++++++++++++++++++++---------------
 6 files changed, 100 insertions(+), 51 deletions(-)

--
2.39.2

Comments

David Hildenbrand Sept. 15, 2023, 3:29 p.m. UTC | #1
On 15.09.23 11:51, Daniel Gomez wrote:
> This series add support for high order folios in shmem write
> path.
> 
> This is a continuation of the shmem work from Luis here [1]
> following Matthew Wilcox's suggestion [2] regarding the path to take
> for the folio allocation order calculation.
> 
> [1] RFC v2 add support for blocksize > PAGE_SIZE
> https://lore.kernel.org/all/ZHBowMEDfyrAAOWH@bombadil.infradead.org/T/#md3e93ab46ce2ad9254e1eb54ffe71211988b5632
> [2] https://lore.kernel.org/all/ZHD9zmIeNXICDaRJ@casper.infradead.org/
> 
> Patches have been tested and sent from next-230911. They do apply
> cleanly to the latest next-230914.
> 
> fsx and fstests has been performed on tmpfs with noswap with the
> following results:
> - fsx: 2d test, 21,5B
> - fstests: Same result as baseline for next-230911 [3][4][5]
> 
> [3] Baseline next-230911 failures are: generic/080 generic/126
> generic/193 generic/633 generic/689
> [4] fstests logs baseline: https://gitlab.com/-/snippets/3598621
> [5] fstests logs patches: https://gitlab.com/-/snippets/3598628
> 
> There are at least 2 cases/topics to handle that I'd appreciate
> feedback.
> 1. With the new strategy, you might end up with a folio order matching
> HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if
> THP is enabled.
> 2. When the above (1.) occurs, the code skips the huge path, so
> xa_find with hindex is skipped.

Similar to large anon folios (but different to large non-shmem folios in 
the pagecache), this can result in memory waste.

We discussed that topic in the last bi-weekly mm meeting, and also how 
to eventually configure that for shmem.

Refer to of a summary. [1]

[1] https://lkml.kernel.org/r/4966f496-9f71-460c-b2ab-8661384ce626@arm.com
Matthew Wilcox Sept. 15, 2023, 3:34 p.m. UTC | #2
On Fri, Sep 15, 2023 at 05:29:51PM +0200, David Hildenbrand wrote:
> On 15.09.23 11:51, Daniel Gomez wrote:
> > This series add support for high order folios in shmem write
> > path.
> > There are at least 2 cases/topics to handle that I'd appreciate
> > feedback.
> > 1. With the new strategy, you might end up with a folio order matching
> > HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if
> > THP is enabled.
> > 2. When the above (1.) occurs, the code skips the huge path, so
> > xa_find with hindex is skipped.
> 
> Similar to large anon folios (but different to large non-shmem folios in the
> pagecache), this can result in memory waste.

No, it can't.  This patchset triggers only on write, not on read or page
fault, and it's conservative, so it will only allocate folios which are
entirely covered by the write.  IOW this is memory we must allocate in
order to satisfy the write; we're just allocating it in larger chunks
when we can.
David Hildenbrand Sept. 15, 2023, 3:36 p.m. UTC | #3
On 15.09.23 17:34, Matthew Wilcox wrote:
> On Fri, Sep 15, 2023 at 05:29:51PM +0200, David Hildenbrand wrote:
>> On 15.09.23 11:51, Daniel Gomez wrote:
>>> This series add support for high order folios in shmem write
>>> path.
>>> There are at least 2 cases/topics to handle that I'd appreciate
>>> feedback.
>>> 1. With the new strategy, you might end up with a folio order matching
>>> HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if
>>> THP is enabled.
>>> 2. When the above (1.) occurs, the code skips the huge path, so
>>> xa_find with hindex is skipped.
>>
>> Similar to large anon folios (but different to large non-shmem folios in the
>> pagecache), this can result in memory waste.
> 
> No, it can't.  This patchset triggers only on write, not on read or page
> fault, and it's conservative, so it will only allocate folios which are
> entirely covered by the write.  IOW this is memory we must allocate in
> order to satisfy the write; we're just allocating it in larger chunks
> when we can.

Oh, good! I was assuming you would eventually over-allocate on the write 
path.
Matthew Wilcox Sept. 15, 2023, 3:40 p.m. UTC | #4
On Fri, Sep 15, 2023 at 05:36:27PM +0200, David Hildenbrand wrote:
> On 15.09.23 17:34, Matthew Wilcox wrote:
> > No, it can't.  This patchset triggers only on write, not on read or page
> > fault, and it's conservative, so it will only allocate folios which are
> > entirely covered by the write.  IOW this is memory we must allocate in
> > order to satisfy the write; we're just allocating it in larger chunks
> > when we can.
> 
> Oh, good! I was assuming you would eventually over-allocate on the write
> path.

We might!  But that would be a different patchset, and it would be
subject to its own discussion.

Something else I've been wondering about is possibly reallocating the
pages on a write.  This would apply to both normal files and shmem.
If you read in a file one byte at a time, then overwrite a big chunk of
it with a large single write, that seems like a good signal that maybe
we should manage that part of the file as a single large chunk instead
of individual pages.  Maybe.

Lots of things for people who are obsessed with performance to play
with ;-)
David Hildenbrand Sept. 15, 2023, 3:43 p.m. UTC | #5
On 15.09.23 17:40, Matthew Wilcox wrote:
> On Fri, Sep 15, 2023 at 05:36:27PM +0200, David Hildenbrand wrote:
>> On 15.09.23 17:34, Matthew Wilcox wrote:
>>> No, it can't.  This patchset triggers only on write, not on read or page
>>> fault, and it's conservative, so it will only allocate folios which are
>>> entirely covered by the write.  IOW this is memory we must allocate in
>>> order to satisfy the write; we're just allocating it in larger chunks
>>> when we can.
>>
>> Oh, good! I was assuming you would eventually over-allocate on the write
>> path.
> 
> We might!  But that would be a different patchset, and it would be
> subject to its own discussion.
> 
> Something else I've been wondering about is possibly reallocating the
> pages on a write.  This would apply to both normal files and shmem.
> If you read in a file one byte at a time, then overwrite a big chunk of
> it with a large single write, that seems like a good signal that maybe
> we should manage that part of the file as a single large chunk instead
> of individual pages.  Maybe.
> 
> Lots of things for people who are obsessed with performance to play
> with ;-)

:) Absolutely. ... because if nobody will be consuming that written 
memory any time soon, it might also be the wrong place for a large/huge 
folio.
Daniel Gomez Sept. 18, 2023, 7:32 a.m. UTC | #6
On Fri, Sep 15, 2023 at 05:29:51PM +0200, David Hildenbrand wrote:
> On 15.09.23 11:51, Daniel Gomez wrote:
> > This series add support for high order folios in shmem write
> > path.
> >
> > This is a continuation of the shmem work from Luis here [1]
> > following Matthew Wilcox's suggestion [2] regarding the path to take
> > for the folio allocation order calculation.
> >
> > [1] RFC v2 add support for blocksize > PAGE_SIZE
> > https://lore.kernel.org/all/ZHBowMEDfyrAAOWH@bombadil.infradead.org/T/#md3e93ab46ce2ad9254e1eb54ffe71211988b5632
> > [2] https://lore.kernel.org/all/ZHD9zmIeNXICDaRJ@casper.infradead.org/
> >
> > Patches have been tested and sent from next-230911. They do apply
> > cleanly to the latest next-230914.
> >
> > fsx and fstests has been performed on tmpfs with noswap with the
> > following results:
> > - fsx: 2d test, 21,5B
> > - fstests: Same result as baseline for next-230911 [3][4][5]
> >
> > [3] Baseline next-230911 failures are: generic/080 generic/126
> > generic/193 generic/633 generic/689
> > [4] fstests logs baseline: https://gitlab.com/-/snippets/3598621
> > [5] fstests logs patches: https://gitlab.com/-/snippets/3598628
> >
> > There are at least 2 cases/topics to handle that I'd appreciate
> > feedback.
> > 1. With the new strategy, you might end up with a folio order matching
> > HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if
> > THP is enabled.
> > 2. When the above (1.) occurs, the code skips the huge path, so
> > xa_find with hindex is skipped.
>
> Similar to large anon folios (but different to large non-shmem folios in the
> pagecache), this can result in memory waste.
>
> We discussed that topic in the last bi-weekly mm meeting, and also how to
> eventually configure that for shmem.
>
> Refer to of a summary. [1]
>
> [1] https://lkml.kernel.org/r/4966f496-9f71-460c-b2ab-8661384ce626@arm.com

Thanks for the summary David (I was missing linux-MM from kvack in lei).

I think the PMD_ORDER-1 as max would suffice here to honor/respect the
huge flag. Although, we would end up having a different max value
than pagecache/readahead.
>
> --
> Cheers,
>
> David / dhildenb
>