Message ID | 20230915095042.1320180-1-da.gomez@samsung.com (mailing list archive) |
---|---|
Headers | show |
Series | shmem: high order folios support in write path | expand |
On 15.09.23 11:51, Daniel Gomez wrote: > This series add support for high order folios in shmem write > path. > > This is a continuation of the shmem work from Luis here [1] > following Matthew Wilcox's suggestion [2] regarding the path to take > for the folio allocation order calculation. > > [1] RFC v2 add support for blocksize > PAGE_SIZE > https://lore.kernel.org/all/ZHBowMEDfyrAAOWH@bombadil.infradead.org/T/#md3e93ab46ce2ad9254e1eb54ffe71211988b5632 > [2] https://lore.kernel.org/all/ZHD9zmIeNXICDaRJ@casper.infradead.org/ > > Patches have been tested and sent from next-230911. They do apply > cleanly to the latest next-230914. > > fsx and fstests has been performed on tmpfs with noswap with the > following results: > - fsx: 2d test, 21,5B > - fstests: Same result as baseline for next-230911 [3][4][5] > > [3] Baseline next-230911 failures are: generic/080 generic/126 > generic/193 generic/633 generic/689 > [4] fstests logs baseline: https://gitlab.com/-/snippets/3598621 > [5] fstests logs patches: https://gitlab.com/-/snippets/3598628 > > There are at least 2 cases/topics to handle that I'd appreciate > feedback. > 1. With the new strategy, you might end up with a folio order matching > HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if > THP is enabled. > 2. When the above (1.) occurs, the code skips the huge path, so > xa_find with hindex is skipped. Similar to large anon folios (but different to large non-shmem folios in the pagecache), this can result in memory waste. We discussed that topic in the last bi-weekly mm meeting, and also how to eventually configure that for shmem. Refer to of a summary. [1] [1] https://lkml.kernel.org/r/4966f496-9f71-460c-b2ab-8661384ce626@arm.com
On Fri, Sep 15, 2023 at 05:29:51PM +0200, David Hildenbrand wrote: > On 15.09.23 11:51, Daniel Gomez wrote: > > This series add support for high order folios in shmem write > > path. > > There are at least 2 cases/topics to handle that I'd appreciate > > feedback. > > 1. With the new strategy, you might end up with a folio order matching > > HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if > > THP is enabled. > > 2. When the above (1.) occurs, the code skips the huge path, so > > xa_find with hindex is skipped. > > Similar to large anon folios (but different to large non-shmem folios in the > pagecache), this can result in memory waste. No, it can't. This patchset triggers only on write, not on read or page fault, and it's conservative, so it will only allocate folios which are entirely covered by the write. IOW this is memory we must allocate in order to satisfy the write; we're just allocating it in larger chunks when we can.
On 15.09.23 17:34, Matthew Wilcox wrote: > On Fri, Sep 15, 2023 at 05:29:51PM +0200, David Hildenbrand wrote: >> On 15.09.23 11:51, Daniel Gomez wrote: >>> This series add support for high order folios in shmem write >>> path. >>> There are at least 2 cases/topics to handle that I'd appreciate >>> feedback. >>> 1. With the new strategy, you might end up with a folio order matching >>> HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if >>> THP is enabled. >>> 2. When the above (1.) occurs, the code skips the huge path, so >>> xa_find with hindex is skipped. >> >> Similar to large anon folios (but different to large non-shmem folios in the >> pagecache), this can result in memory waste. > > No, it can't. This patchset triggers only on write, not on read or page > fault, and it's conservative, so it will only allocate folios which are > entirely covered by the write. IOW this is memory we must allocate in > order to satisfy the write; we're just allocating it in larger chunks > when we can. Oh, good! I was assuming you would eventually over-allocate on the write path.
On Fri, Sep 15, 2023 at 05:36:27PM +0200, David Hildenbrand wrote: > On 15.09.23 17:34, Matthew Wilcox wrote: > > No, it can't. This patchset triggers only on write, not on read or page > > fault, and it's conservative, so it will only allocate folios which are > > entirely covered by the write. IOW this is memory we must allocate in > > order to satisfy the write; we're just allocating it in larger chunks > > when we can. > > Oh, good! I was assuming you would eventually over-allocate on the write > path. We might! But that would be a different patchset, and it would be subject to its own discussion. Something else I've been wondering about is possibly reallocating the pages on a write. This would apply to both normal files and shmem. If you read in a file one byte at a time, then overwrite a big chunk of it with a large single write, that seems like a good signal that maybe we should manage that part of the file as a single large chunk instead of individual pages. Maybe. Lots of things for people who are obsessed with performance to play with ;-)
On 15.09.23 17:40, Matthew Wilcox wrote: > On Fri, Sep 15, 2023 at 05:36:27PM +0200, David Hildenbrand wrote: >> On 15.09.23 17:34, Matthew Wilcox wrote: >>> No, it can't. This patchset triggers only on write, not on read or page >>> fault, and it's conservative, so it will only allocate folios which are >>> entirely covered by the write. IOW this is memory we must allocate in >>> order to satisfy the write; we're just allocating it in larger chunks >>> when we can. >> >> Oh, good! I was assuming you would eventually over-allocate on the write >> path. > > We might! But that would be a different patchset, and it would be > subject to its own discussion. > > Something else I've been wondering about is possibly reallocating the > pages on a write. This would apply to both normal files and shmem. > If you read in a file one byte at a time, then overwrite a big chunk of > it with a large single write, that seems like a good signal that maybe > we should manage that part of the file as a single large chunk instead > of individual pages. Maybe. > > Lots of things for people who are obsessed with performance to play > with ;-) :) Absolutely. ... because if nobody will be consuming that written memory any time soon, it might also be the wrong place for a large/huge folio.
On Fri, Sep 15, 2023 at 05:29:51PM +0200, David Hildenbrand wrote: > On 15.09.23 11:51, Daniel Gomez wrote: > > This series add support for high order folios in shmem write > > path. > > > > This is a continuation of the shmem work from Luis here [1] > > following Matthew Wilcox's suggestion [2] regarding the path to take > > for the folio allocation order calculation. > > > > [1] RFC v2 add support for blocksize > PAGE_SIZE > > https://lore.kernel.org/all/ZHBowMEDfyrAAOWH@bombadil.infradead.org/T/#md3e93ab46ce2ad9254e1eb54ffe71211988b5632 > > [2] https://lore.kernel.org/all/ZHD9zmIeNXICDaRJ@casper.infradead.org/ > > > > Patches have been tested and sent from next-230911. They do apply > > cleanly to the latest next-230914. > > > > fsx and fstests has been performed on tmpfs with noswap with the > > following results: > > - fsx: 2d test, 21,5B > > - fstests: Same result as baseline for next-230911 [3][4][5] > > > > [3] Baseline next-230911 failures are: generic/080 generic/126 > > generic/193 generic/633 generic/689 > > [4] fstests logs baseline: https://gitlab.com/-/snippets/3598621 > > [5] fstests logs patches: https://gitlab.com/-/snippets/3598628 > > > > There are at least 2 cases/topics to handle that I'd appreciate > > feedback. > > 1. With the new strategy, you might end up with a folio order matching > > HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if > > THP is enabled. > > 2. When the above (1.) occurs, the code skips the huge path, so > > xa_find with hindex is skipped. > > Similar to large anon folios (but different to large non-shmem folios in the > pagecache), this can result in memory waste. > > We discussed that topic in the last bi-weekly mm meeting, and also how to > eventually configure that for shmem. > > Refer to of a summary. [1] > > [1] https://lkml.kernel.org/r/4966f496-9f71-460c-b2ab-8661384ce626@arm.com Thanks for the summary David (I was missing linux-MM from kvack in lei). I think the PMD_ORDER-1 as max would suffice here to honor/respect the huge flag. Although, we would end up having a different max value than pagecache/readahead. > > -- > Cheers, > > David / dhildenb >