Message ID | 20250411221111.493193-1-gourry@gourry.net (mailing list archive) |
---|---|
Headers | show |
Series | Promotion of Unmapped Page Cache Folios. | expand |
On Fri, Apr 11, 2025 at 06:11:05PM -0400, Gregory Price wrote:
> Unmapped page cache pages can be demoted to low-tier memory, but
No. Page cache should never be demoted to low-tier memory.
NACK this patchset.
On Sat, Apr 12, 2025 at 12:49:18AM +0100, Matthew Wilcox wrote: > On Fri, Apr 11, 2025 at 06:11:05PM -0400, Gregory Price wrote: > > Unmapped page cache pages can be demoted to low-tier memory, but > > No. Page cache should never be demoted to low-tier memory. > NACK this patchset. This wasn't a statement of approval page cache being on lower tiers, it's a statement of fact. Enabling demotion causes this issue. ~Gregory
On Fri, Apr 11, 2025 at 08:09:55PM -0400, Gregory Price wrote: > On Sat, Apr 12, 2025 at 12:49:18AM +0100, Matthew Wilcox wrote: > > On Fri, Apr 11, 2025 at 06:11:05PM -0400, Gregory Price wrote: > > > Unmapped page cache pages can be demoted to low-tier memory, but > > > > No. Page cache should never be demoted to low-tier memory. > > NACK this patchset. > > This wasn't a statement of approval page cache being on lower tiers, > it's a statement of fact. Enabling demotion causes this issue. Then that's the bug that needs to be fixed. Not adding 200+ lines of code to recover from a situation that should never happen.
On Sat, Apr 12, 2025 at 01:35:56AM +0100, Matthew Wilcox wrote: > On Fri, Apr 11, 2025 at 08:09:55PM -0400, Gregory Price wrote: > > On Sat, Apr 12, 2025 at 12:49:18AM +0100, Matthew Wilcox wrote: > > > On Fri, Apr 11, 2025 at 06:11:05PM -0400, Gregory Price wrote: > > > > Unmapped page cache pages can be demoted to low-tier memory, but > > > > > > No. Page cache should never be demoted to low-tier memory. > > > NACK this patchset. > > > > This wasn't a statement of approval page cache being on lower tiers, > > it's a statement of fact. Enabling demotion causes this issue. > > Then that's the bug that needs to be fixed. Not adding 200+ lines > of code to recover from a situation that should never happen. Well, I have a use case that make valuable use of putting the page cache on a farther node rather than pushing it out to disk. But this discussion aside, I think we could simply make this a separate mode of demotion_enabled /* Only demote anonymous memory */ echo 2 > /sys/kernel/mm/numa/demotion_enabled Assuming we can recognize anon from just struct folio ~Gregory
Gregory Price <gourry@gourry.net> writes: > On Sat, Apr 12, 2025 at 01:35:56AM +0100, Matthew Wilcox wrote: >> On Fri, Apr 11, 2025 at 08:09:55PM -0400, Gregory Price wrote: >> > On Sat, Apr 12, 2025 at 12:49:18AM +0100, Matthew Wilcox wrote: >> > > On Fri, Apr 11, 2025 at 06:11:05PM -0400, Gregory Price wrote: >> > > > Unmapped page cache pages can be demoted to low-tier memory, but >> > > >> > > No. Page cache should never be demoted to low-tier memory. >> > > NACK this patchset. Hi Matthew, Could you please give some context around why shouldn't page cache be considered as a demotion target if demotion is enabled? Shouldn't demoting page cache pages to a lower tier (when we have enough space in lower tier) can be a better alternative then discarding these pages and later doing I/Os to read them back again? >> > >> > This wasn't a statement of approval page cache being on lower tiers, >> > it's a statement of fact. Enabling demotion causes this issue. >> >> Then that's the bug that needs to be fixed. Not adding 200+ lines >> of code to recover from a situation that should never happen. /me goes and checks when the demotion feature was added... Ok, so I believe this was added here [1] "[PATCH -V10 4/9] mm/migrate: demote pages during reclaim". [1]: https://lore.kernel.org/all/20210715055145.195411-5-ying.huang@intel.com/T/#u I think systems with persistent memory acting as DRAM nodes, could choose to demote page cache pages too, to lower tier instead of simply discarding them and later doing I/O to read them back from disk. e.g. when one has a smaller size DRAM as faster tier and larger size PMEM as slower tier. During memory pressure on faster tier, demoting page cache pages to slower tier can be helpful to avoid doing I/O later to read them back in, isn't it? > > Well, I have a use case that make valuable use of putting the page cache > on a farther node rather than pushing it out to disk. But this > discussion aside, I think we could simply make this a separate mode of > demotion_enabled > > /* Only demote anonymous memory */ > echo 2 > /sys/kernel/mm/numa/demotion_enabled > If we are going down this road... then should we consider what other choices users may need for their usecases? e.g. 0: Demotion disabled 1: Demotion enabled for both anon and file pages Till here the support is already present. 2: Demotion enabled only for anon pages 3: Demotion enabled only for file pages Should this be further classified for dirty v/s clean page cache pages too? > Assuming we can recognize anon from just struct folio I am not 100% sure of this, so others should correct. Should this simply be, folio_is_file_lru() to differentiate page cache pages? Although this still might give us anon pages which have the PG_swapbacked dropped as a result of MADV_FREE. Note sure if that need any special care though? -ritesh
On Sat, Apr 12, 2025 at 05:22:24PM +0530, Ritesh Harjani wrote: > Gregory Price <gourry@gourry.net> writes: > 0: Demotion disabled > 1: Demotion enabled for both anon and file pages > Till here the support is already present. > > 2: Demotion enabled only for anon pages > 3: Demotion enabled only for file pages > > Should this be further classified for dirty v/s clean page cache > pages too? > There are some limitations around migrating dirty pages IIRC, but right now the vmscan code indescriminately adds any and all folios to the demotion list if it gets to that chunk of the code. > > Assuming we can recognize anon from just struct folio > > I am not 100% sure of this, so others should correct. Should this > simply be, folio_is_file_lru() to differentiate page cache pages? > > Although this still might give us anon pages which have the > PG_swapbacked dropped as a result of MADV_FREE. Note sure if that need > any special care though? > I made the comment without looking but yeah, PageAnon/folio_test_anon exist, so this exists in some form somewhere. Basically there's some space to do something a little less indescriminate here. ~Gregory
On 4/12/25 5:19 AM, Matthew Wilcox wrote: > On Fri, Apr 11, 2025 at 06:11:05PM -0400, Gregory Price wrote: >> Unmapped page cache pages can be demoted to low-tier memory, but > No. Page cache should never be demoted to low-tier memory. > NACK this patchset. Hi Mathew, I have one doubt. Under memory pressure, page cache allocations can fall back to lower-tier memory, right? So later, if those page cache pages become hot, shouldn't we promote them? Thanks Donet
On Sun, Apr 13, 2025 at 10:53:48AM +0530, Donet Tom wrote: > > On 4/12/25 5:19 AM, Matthew Wilcox wrote: > > On Fri, Apr 11, 2025 at 06:11:05PM -0400, Gregory Price wrote: > > > Unmapped page cache pages can be demoted to low-tier memory, but > > No. Page cache should never be demoted to low-tier memory. > > NACK this patchset. > > Hi Mathew, > > I have one doubt. Under memory pressure, page cache allocations can > fall back to lower-tier memory, right? So later, if those page cache pages > become hot, shouldn't we promote them? That shouldn't happen either. CXL should never be added to the page allocator. You guys are creating a lot of problems for yourselves, and I've been clear that I want no part of this. >
Hi Gregory, Thank you for continuing and sharing this nice work. Adding some comments based on my humble and DAMON-biased perspective below. On Fri, 11 Apr 2025 18:11:05 -0400 Gregory Price <gourry@gourry.net> wrote: > Unmapped page cache pages can be demoted to low-tier memory, but > they can presently only be promoted in two conditions: > 1) The page is fully swapped out and re-faulted > 2) The page becomes mapped (and exposed to NUMA hint faults) Yet another way is running DAMOS with DAMOS_MIGRATE_HOT action and unmapped pages DAMOS filter. Or, without unmapped pages DAMOS filter, if you want to promote both mapped and unmapped pages. It is easy to tell, but I anticiapte it will show many limitations on your use case. I anyway only very recently shared[1] my version of experimental prototype and its evaluation results. I also understand this patch series is the simple and well working solution for your use case, and I have no special blocker for this work. Nevertheless, I was just thinking it would be nice if your anticipated or expected limitations and opportunities of other approaches including DAMON-based one can be put here together. [...] > > Development History and Notes > ======================================= > During development, we explored the following proposals: This is very informative and helpful for getting the context. Thank you for sharing. [...] > 4) Adding a separate kthread - suggested by many DAMON-based approach might also be categorized here since DAMON does access monitoring and monitoring results-based system operations (migration in this context) in a separate thread. > > This is - to an extent - a more general version of the LRU proposal. > We still have to track the folios - which likely requires the > addition of a page flag. In case of DAMON-based approach, this is not technically true, since it uses its own abstraction called DAMON region. Of course, DAMON region abstraction is not a panacea. There were concerns around the accuracy of the region abstraction. We found unoptimum DAMON intervals tuning could be one of the source of the poor accuracy and recently made an automation[2] of the tuning. I remeber you previously mentioned it might make sense to utilize DAMON as a way to save such additional information. It has been one of the motivations for my recent suggestion of a new DAMON API[3], namely damon_report_access(). It will allow any kernel code reports their access finding to DAMON with controlled overhead. The aimed usage is to make page faults handler, folio_mark_accessed(), and AMD IBS sample handers like code path passes the information to DAMON via the API function. > Additionally, this method would actually > contend pretty heavily with LRU behavior - i.e. we'd want to > throttle addition to the promotion candidate list in some scenarios. DAMON-based approach could use DAMOS quota feature for this kind of purpose. > > > 5) Doing it in task work > > This seemed to be the most realistic after considering the above. > > We observe the following: > - FMA is an ideal hook for this and isolation is safe here My one concern is that users can ask DAMON to call folio_mark_accessed() for non unmapped page cache folios, via DAMOS_LRU_PRIO. Promoting the folio could be understood as a sort of LRU-prioritizing, so I'm not really concerned about DAMON's behavioral change that this patch series could introduce. Rather than that, I'm concerned if vmstat change of this patch series could be confused by such DAMON users. > - the new promotion_candidate function is an ideal hook for new > filter logic (throttling, fairness, etc). Agreed. DAMON's target filtering and aim-based aggressiveness self-tuning features could be such logic. I suggested[3] damos_add_folio() as a potential future DAMON API for such use cases. With this patch series, nevertheless, only folio_mark_accessed() called folios could get such opportunity. Do you have future plans to integrate faults-based promotion logic with this function, and extend for other access information source? [1] https://lore.kernel.org/20250320053937.57734-1-sj@kernel.org [2] https://lkml.kernel.org/r/20250303221726.484227-1-sj@kernel.org [3] https://lwn.net/Articles/1016525/ Thanks, SJ [...]