Message ID | 20250203225604.44742-3-usamaarif642@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/damon: add support for hugepages | expand |
On Mon, 3 Feb 2025 22:55:29 +0000 Usama Arif <usamaarif642@gmail.com> wrote: > This is introduced for larger folios. If a large folio has subpages > present in multiple regions, it will be considered multiple times. > This can be when checking access I think this behavior makes sense. If a 2 MiB address range is known to be accessed or not, treating DAMON regions in the address range as all accessed or not aligns with the DAMON's region definition and makes sense in general, to me. Note that DAMON's adaptive regions adjustment mechanism will make the DAMON regions in a large folio to be merged in near future. > or applying DAMOS schemes. For e.g. > in pa_stat, folios split across N regions will be counted N times, > giving inaccurate results. Nice catch! I agree this could be a problem. > Hence, only consider a page for access check/DAMOS scheme only if the head > page is part of that region as well. For access checking, this seems wrong to me. This change will unnecessarily mark some DAMON regions as not accessed. Even for DAMOS, this seems not very optimum. 1. DAMOS will unnecessarily repeat PAGE_SIZE skippings on second and later DAMON regions of a same large folio. 2. If only a small part of the large folio belongs to the first DAMON region, and the DAMON region is not eligible to the scheme, while the second region is, the scheme action will not applied to the second region. Also, I think this problem can happen on vaddr case. Hence, making the change on core layer makes sense to me. That is, let damon_ops->apply_scheme() inform the caller (damos_apply_scheme()) how much bytes of the next region should be ignored at applying the action. Due to the complicated implementation of DAMOS core logic, this might be not very simple. I think the additional complexity might outweigh the benefit for following reasons. First, adaptive regions adjustment mechanism will make DAMON regions in same large folios to be merged soon. So the problem will be not significant in common case. Second, any threads could collapse any parts of memory into large folios while DAMOS is traversing DAMON regions. So this problem cannot perfectly be solved unless we make DAMOS' DAMON regions traversal exclusive with any large folios collapsing. Which would add more complexity. And given DAMON's best-effort nature, keeping the issue until we get a simple and effective solution is ok to me, unless a real significant issue from this is confirmed. Usama, what do you think? Thanks, SJ [...]
On 04/02/2025 23:06, SeongJae Park wrote: > On Mon, 3 Feb 2025 22:55:29 +0000 Usama Arif <usamaarif642@gmail.com> wrote: > >> This is introduced for larger folios. If a large folio has subpages >> present in multiple regions, it will be considered multiple times. >> This can be when checking access > > I think this behavior makes sense. If a 2 MiB address range is known to be > accessed or not, treating DAMON regions in the address range as all accessed or > not aligns with the DAMON's region definition and makes sense in general, to > me. Note that DAMON's adaptive regions adjustment mechanism will make the > DAMON regions in a large folio to be merged in near future. > >> or applying DAMOS schemes. For e.g. >> in pa_stat, folios split across N regions will be counted N times, >> giving inaccurate results. > > Nice catch! I agree this could be a problem. > >> Hence, only consider a page for access check/DAMOS scheme only if the head >> page is part of that region as well. > > For access checking, this seems wrong to me. This change will unnecessarily > mark some DAMON regions as not accessed. > > Even for DAMOS, this seems not very optimum. > > 1. DAMOS will unnecessarily repeat PAGE_SIZE skippings on second and later > DAMON regions of a same large folio. Hi SJ, Thanks for the reviews! Just a small point on 1. below, but most is addressed at the end. If needed, we could add another arg in damon_get_folio_in_region to return the size of folio if head page was not part of the region to skip by that size instead of PAGE_SIZE. > 2. If only a small part of the large folio belongs to the first DAMON region, > and the DAMON region is not eligible to the scheme, while the second region > is, the scheme action will not applied to the second region. > > Also, I think this problem can happen on vaddr case. Hence, making the change > on core layer makes sense to me. > > That is, let damon_ops->apply_scheme() inform the caller (damos_apply_scheme()) > how much bytes of the next region should be ignored at applying the action. > Due to the complicated implementation of DAMOS core logic, this might be not > very simple. > > I think the additional complexity might outweigh the benefit for following > reasons. First, adaptive regions adjustment mechanism will make DAMON regions > in same large folios to be merged soon. So the problem will be not significant > in common case. Second, any threads could collapse any parts of memory into > large folios while DAMOS is traversing DAMON regions. So this problem cannot > perfectly be solved unless we make DAMOS' DAMON regions traversal exclusive > with any large folios collapsing. Which would add more complexity. > > And given DAMON's best-effort nature, keeping the issue until we get a simple > and effective solution is ok to me, unless a real significant issue from this > is confirmed. > > Usama, what do you think? > > So the reason I added this patch was when testing out the series on a VM that was only running a C program continuously accessing 200M hugepages, DAMON was reporting more than 200M of hugepages everytime, it was usually around 206-208M, but sometimes quite high. I have added an example at the end showing 266M of hugepages, which is more than 25% off. Since this is page level attributes and not sampling based, I believe we should report accurate results everytime, even at the start. I think it makes sense when you say that adaptive region adjustment will make the regions converge, but without this patch even after 20 minutes of running, I couldn't get 200M. With this patch, I get 200M everytime. I agree with the points you raised and I thought of the same when trying to come up with this patch. The reason I did in both access check and DAMOS was I thought its best to have consistency throughout on how folios in damon regions are treated. I think the solution for this is probably some compromise between simplicity and accuracy. I think we have 3 ways forward? 1. Try and improve current patch, where we ignore the folio if the head page is not part of the region, for both access check and DAMOS. The way we treat large folios will then be consistent for all cases, but there are other issues which you highlighted above. 2. Only do the approach in this patch for DAMOS schemes, i.e. use damon_get_folio_in_region only for damon_pa_pageout/mark_accessed_or_deactivate/migrate/stat. We dont do it for damon_pa_mkold/young. 3. Only do this approach for pa_stat. My preference is going forward with option 2, as we wont do pageout/migration/etc repeatedly to the same folio, but I think option 3 is fine as well, so that we atleast get the right stats. Let me know what you think is the best way forward? [root@vm4 src]# cat /proc/meminfo | grep -i anonhuge AnonHugePages: 204800 kB [root@vm4 src]# ./run.sh heatmap: 44444444888889888887333333332222000000000000000000000000000000000000000000000000 # min/max temperatures: -1,610,000,000, 366,877, column size: 25.586 MiB # damos filters (df): reject none hugepage 0 addr 1024.000 KiB size 203.875 MiB access 0 % age 7.500 s df-passed 0 B 1 addr 204.875 MiB size 5.949 MiB access 0 % age 200 ms df-passed 0 B 2 addr 210.824 MiB size 16.062 MiB access 0 % age 0 ns df-passed 0 B 3 addr 226.887 MiB size 37.480 MiB access 0 % age 0 ns df-passed 18.000 MiB 4 addr 264.367 MiB size 10.316 MiB access 15 % age 0 ns df-passed 12.000 MiB 5 addr 274.684 MiB size 4.426 MiB access 10 % age 0 ns df-passed 6.000 MiB 6 addr 279.109 MiB size 5.688 MiB access 15 % age 0 ns df-passed 6.000 MiB 7 addr 284.797 MiB size 648.000 KiB access 15 % age 0 ns df-passed 2.000 MiB 8 addr 285.430 MiB size 56.000 KiB access 10 % age 100 ms df-passed 2.000 MiB 9 addr 285.484 MiB size 504.000 KiB access 10 % age 100 ms df-passed 2.000 MiB 10 addr 285.977 MiB size 844.000 KiB access 5 % age 0 ns df-passed 2.000 MiB 11 addr 286.801 MiB size 3.305 MiB access 5 % age 0 ns df-passed 4.000 MiB 12 addr 290.105 MiB size 6.281 MiB access 0 % age 100 ms df-passed 8.000 MiB 13 addr 296.387 MiB size 6.285 MiB access 0 % age 100 ms df-passed 8.000 MiB 14 addr 302.672 MiB size 1.211 MiB access 5 % age 0 ns df-passed 2.000 MiB 15 addr 303.883 MiB size 1.211 MiB access 5 % age 0 ns df-passed 2.000 MiB 16 addr 305.094 MiB size 840.000 KiB access 0 % age 0 ns df-passed 2.000 MiB 17 addr 305.914 MiB size 364.000 KiB access 0 % age 0 ns df-passed 2.000 MiB 18 addr 306.270 MiB size 844.000 KiB access 10 % age 0 ns df-passed 2.000 MiB 19 addr 307.094 MiB size 364.000 KiB access 10 % age 0 ns df-passed 2.000 MiB 20 addr 307.449 MiB size 8.484 MiB access 5 % age 0 ns df-passed 10.000 MiB 21 addr 315.934 MiB size 8.488 MiB access 5 % age 0 ns df-passed 10.000 MiB 22 addr 324.422 MiB size 772.000 KiB access 0 % age 100 ms df-passed 2.000 MiB 23 addr 325.176 MiB size 1.133 MiB access 0 % age 100 ms df-passed 2.000 MiB 24 addr 326.309 MiB size 224.000 KiB access 0 % age 0 ns df-passed 2.000 MiB 25 addr 326.527 MiB size 900.000 KiB access 0 % age 0 ns df-passed 2.000 MiB 26 addr 327.406 MiB size 3.961 MiB access 0 % age 0 ns df-passed 4.000 MiB 27 addr 331.367 MiB size 5.945 MiB access 0 % age 0 ns df-passed 6.000 MiB 28 addr 337.312 MiB size 1.867 MiB access 5 % age 0 ns df-passed 2.000 MiB 29 addr 339.180 MiB size 824.000 KiB access 10 % age 0 ns df-passed 2.000 MiB 30 addr 339.984 MiB size 548.000 KiB access 0 % age 0 ns df-passed 2.000 MiB 31 addr 340.520 MiB size 2.141 MiB access 0 % age 0 ns df-passed 4.000 MiB 32 addr 342.660 MiB size 2.617 MiB access 0 % age 0 ns df-passed 4.000 MiB 33 addr 345.277 MiB size 1.125 MiB access 0 % age 0 ns df-passed 2.000 MiB 34 addr 346.402 MiB size 984.000 KiB access 5 % age 0 ns df-passed 2.000 MiB 35 addr 347.363 MiB size 660.000 KiB access 5 % age 0 ns df-passed 2.000 MiB 36 addr 348.008 MiB size 768.000 KiB access 15 % age 100 ms df-passed 2.000 MiB 37 addr 348.758 MiB size 192.000 KiB access 15 % age 100 ms df-passed 2.000 MiB 38 addr 348.945 MiB size 1.098 MiB access 5 % age 0 ns df-passed 2.000 MiB 39 addr 350.043 MiB size 1.102 MiB access 5 % age 0 ns df-passed 2.000 MiB 40 addr 351.145 MiB size 768.000 KiB access 0 % age 0 ns df-passed 2.000 MiB 41 addr 351.895 MiB size 516.000 KiB access 0 % age 0 ns df-passed 2.000 MiB 42 addr 352.398 MiB size 2.258 MiB access 10 % age 0 ns df-passed 4.000 MiB 43 addr 354.656 MiB size 9.039 MiB access 10 % age 0 ns df-passed 10.000 MiB 44 addr 363.695 MiB size 38.391 MiB access 0 % age 0 ns df-passed 40.000 MiB 45 addr 402.086 MiB size 16.457 MiB access 0 % age 0 ns df-passed 18.000 MiB 46 addr 418.543 MiB size 2.168 MiB access 5 % age 0 ns df-passed 4.000 MiB 47 addr 420.711 MiB size 3.258 MiB access 5 % age 0 ns df-passed 4.000 MiB 48 addr 423.969 MiB size 124.000 KiB access 10 % age 100 ms df-passed 2.000 MiB 49 addr 424.090 MiB size 32.000 KiB access 10 % age 100 ms df-passed 2.000 MiB 50 addr 424.121 MiB size 1.469 MiB access 5 % age 0 ns df-passed 2.000 MiB 51 addr 425.590 MiB size 380.000 KiB access 5 % age 0 ns df-passed 2.000 MiB 52 addr 425.961 MiB size 752.000 KiB access 10 % age 0 ns df-passed 2.000 MiB 53 addr 426.695 MiB size 1.109 MiB access 10 % age 0 ns df-passed 2.000 MiB 54 addr 427.805 MiB size 54.031 MiB access 0 % age 0 ns df-passed 22.000 MiB 55 addr 481.836 MiB size 23.160 MiB access 0 % age 0 ns df-passed 0 B 56 addr 504.996 MiB size 194.918 MiB access 0 % age 9 s df-passed 0 B 57 addr 699.914 MiB size 118.688 MiB access 0 % age 11.500 s df-passed 0 B 58 addr 818.602 MiB size 199.594 MiB access 0 % age 15.300 s df-passed 0 B 59 addr 1018.195 MiB size 1.006 GiB access 0 % age 16.100 s df-passed 0 B total size: 1.999 GiB df-passed: 266.000 MiB [root@vm4 src]# cat /proc/meminfo | grep -i anonhuge AnonHugePages: 204800 kB
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c index 0fb61f6ddb8d..3f59a3fdc391 100644 --- a/mm/damon/paddr.c +++ b/mm/damon/paddr.c @@ -19,6 +19,30 @@ #include "../internal.h" #include "ops-common.h" +/* + * Get an online page for a paddr if it's in the LRU list and if the head page is + * before region_start. Otherwise, returns NULL. + */ +static struct folio *damon_get_folio_in_region(unsigned long addr, unsigned long region_start) +{ + struct page *page = pfn_to_online_page(PHYS_PFN(addr)); + struct folio *folio; + + if (!page) + return NULL; + + folio = page_folio(page); + if (addr - folio_page_idx(folio, page) * PAGE_SIZE < region_start) + return NULL; + if (!folio_test_lru(folio) || !folio_try_get(folio)) + return NULL; + if (unlikely(page_folio(page) != folio || !folio_test_lru(folio))) { + folio_put(folio); + folio = NULL; + } + return folio; +} + static bool damon_folio_mkold_one(struct folio *folio, struct vm_area_struct *vma, unsigned long addr, void *arg) { @@ -58,9 +82,9 @@ static void damon_folio_mkold(struct folio *folio) } -static void damon_pa_mkold(unsigned long paddr) +static void damon_pa_mkold(unsigned long paddr, unsigned long region_start) { - struct folio *folio = damon_get_folio(PHYS_PFN(paddr)); + struct folio *folio = damon_get_folio_in_region(paddr, region_start); if (!folio) return; @@ -73,7 +97,7 @@ static void __damon_pa_prepare_access_check(struct damon_region *r) { r->sampling_addr = damon_rand(r->ar.start, r->ar.end); - damon_pa_mkold(r->sampling_addr); + damon_pa_mkold(r->sampling_addr, r->ar.start); } static void damon_pa_prepare_access_checks(struct damon_ctx *ctx) @@ -148,9 +172,9 @@ static bool damon_folio_young(struct folio *folio) return accessed; } -static bool damon_pa_young(unsigned long paddr, unsigned long *folio_sz) +static bool damon_pa_young(unsigned long paddr, unsigned long *folio_sz, unsigned long region_start) { - struct folio *folio = damon_get_folio(PHYS_PFN(paddr)); + struct folio *folio = damon_get_folio_in_region(paddr, region_start); bool accessed; if (!folio) @@ -176,7 +200,7 @@ static void __damon_pa_check_access(struct damon_region *r, return; } - last_accessed = damon_pa_young(r->sampling_addr, &last_folio_sz); + last_accessed = damon_pa_young(r->sampling_addr, &last_folio_sz, r->ar.start); damon_update_region_access_rate(r, last_accessed, attrs); last_addr = r->sampling_addr; @@ -268,7 +292,7 @@ static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s, addr = r->ar.start; while (addr < r->ar.end) { - struct folio *folio = damon_get_folio(PHYS_PFN(addr)); + struct folio *folio = damon_get_folio_in_region(addr, r->ar.start); if (!folio) { addr += PAGE_SIZE; @@ -307,7 +331,7 @@ static inline unsigned long damon_pa_mark_accessed_or_deactivate( addr = r->ar.start; while (addr < r->ar.end) { - struct folio *folio = damon_get_folio(PHYS_PFN(addr)); + struct folio *folio = damon_get_folio_in_region(addr, r->ar.start); if (!folio) { addr += PAGE_SIZE; @@ -474,7 +498,7 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s, addr = r->ar.start; while (addr < r->ar.end) { - struct folio *folio = damon_get_folio(PHYS_PFN(addr)); + struct folio *folio = damon_get_folio_in_region(addr, r->ar.start); if (!folio) { addr += PAGE_SIZE; @@ -518,7 +542,7 @@ static unsigned long damon_pa_stat(struct damon_region *r, struct damos *s, addr = r->ar.start; while (addr < r->ar.end) { - struct folio *folio = damon_get_folio(PHYS_PFN(addr)); + struct folio *folio = damon_get_folio_in_region(addr, r->ar.start); if (!folio) { addr += PAGE_SIZE;
This is introduced for larger folios. If a large folio has subpages present in multiple regions, it will be considered multiple times. This can be when checking access or applying DAMOS schemes. For e.g. in pa_stat, folios split across N regions will be counted N times, giving inaccurate results. Hence, only consider a page for access check/DAMOS scheme only if the head page is part of that region as well. Signed-off-by: Usama Arif <usamaarif642@gmail.com> --- mm/damon/paddr.c | 44 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 34 insertions(+), 10 deletions(-)