diff mbox series

[v4,2/6] mm/damon/paddr: use damon_get_folio_in_region to obtain folio

Message ID 20250203225604.44742-3-usamaarif642@gmail.com (mailing list archive)
State New
Headers show
Series mm/damon: add support for hugepages | expand

Commit Message

Usama Arif Feb. 3, 2025, 10:55 p.m. UTC
This is introduced for larger folios. If a large folio has subpages
present in multiple regions, it will be considered multiple times.
This can be when checking access or applying DAMOS schemes. For e.g.
in pa_stat, folios split across N regions will be counted N times,
giving inaccurate results. Hence, only consider a page for access
check/DAMOS scheme only if the head page is part of that region as
well.

Signed-off-by: Usama Arif <usamaarif642@gmail.com>
---
 mm/damon/paddr.c | 44 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 10 deletions(-)

Comments

SeongJae Park Feb. 4, 2025, 11:06 p.m. UTC | #1
On Mon,  3 Feb 2025 22:55:29 +0000 Usama Arif <usamaarif642@gmail.com> wrote:

> This is introduced for larger folios. If a large folio has subpages
> present in multiple regions, it will be considered multiple times.
> This can be when checking access

I think this behavior makes sense.  If a 2 MiB address range is known to be
accessed or not, treating DAMON regions in the address range as all accessed or
not aligns with the DAMON's region definition and makes sense in general, to
me.  Note that DAMON's adaptive regions adjustment mechanism will make the
DAMON regions in a large folio to be merged in near future.

> or applying DAMOS schemes. For e.g.
> in pa_stat, folios split across N regions will be counted N times,
> giving inaccurate results.

Nice catch!  I agree this could be a problem.

> Hence, only consider a page for access check/DAMOS scheme only if the head
> page is part of that region as well.

For access checking, this seems wrong to me.  This change will unnecessarily
mark some DAMON regions as not accessed.

Even for DAMOS, this seems not very optimum.

1. DAMOS will unnecessarily repeat PAGE_SIZE skippings on second and later
   DAMON regions of a same large folio.
2. If only a small part of the large folio belongs to the first DAMON region,
   and the DAMON region is not eligible to the scheme, while the second region
   is, the scheme action will not applied to the second region.

Also, I think this problem can happen on vaddr case.  Hence, making the change
on core layer makes sense to me.

That is, let damon_ops->apply_scheme() inform the caller (damos_apply_scheme())
how much bytes of the next region should be ignored at applying the action.
Due to the complicated implementation of DAMOS core logic, this might be not
very simple.

I think the additional complexity might outweigh the benefit for following
reasons.  First, adaptive regions adjustment mechanism will make DAMON regions
in same large folios to be merged soon.  So the problem will be not significant
in common case.  Second, any threads could collapse any parts of memory into
large folios while DAMOS is traversing DAMON regions.  So this problem cannot
perfectly be solved unless we make DAMOS' DAMON regions traversal exclusive
with any large folios collapsing.  Which would add more complexity.

And given DAMON's best-effort nature, keeping the issue until we get a simple
and effective solution is ok to me, unless a real significant issue from this
is confirmed.

Usama, what do you think?


Thanks,
SJ

[...]
Usama Arif Feb. 5, 2025, 12:46 p.m. UTC | #2
On 04/02/2025 23:06, SeongJae Park wrote:
> On Mon,  3 Feb 2025 22:55:29 +0000 Usama Arif <usamaarif642@gmail.com> wrote:
> 
>> This is introduced for larger folios. If a large folio has subpages
>> present in multiple regions, it will be considered multiple times.
>> This can be when checking access
> 
> I think this behavior makes sense.  If a 2 MiB address range is known to be
> accessed or not, treating DAMON regions in the address range as all accessed or
> not aligns with the DAMON's region definition and makes sense in general, to
> me.  Note that DAMON's adaptive regions adjustment mechanism will make the
> DAMON regions in a large folio to be merged in near future.
> 
>> or applying DAMOS schemes. For e.g.
>> in pa_stat, folios split across N regions will be counted N times,
>> giving inaccurate results.
> 
> Nice catch!  I agree this could be a problem.
> 
>> Hence, only consider a page for access check/DAMOS scheme only if the head
>> page is part of that region as well.
> 
> For access checking, this seems wrong to me.  This change will unnecessarily
> mark some DAMON regions as not accessed.
> 
> Even for DAMOS, this seems not very optimum.
> 
> 1. DAMOS will unnecessarily repeat PAGE_SIZE skippings on second and later
>    DAMON regions of a same large folio.

Hi SJ,

Thanks for the reviews! Just a small point on 1. below, but most is addressed at
the end.

If needed, we could add another arg in damon_get_folio_in_region to return the size
of folio if head page was not part of the region to skip by that size instead of
PAGE_SIZE.

> 2. If only a small part of the large folio belongs to the first DAMON region,
>    and the DAMON region is not eligible to the scheme, while the second region
>    is, the scheme action will not applied to the second region.
>
> Also, I think this problem can happen on vaddr case.  Hence, making the change
> on core layer makes sense to me.
> 
> That is, let damon_ops->apply_scheme() inform the caller (damos_apply_scheme())
> how much bytes of the next region should be ignored at applying the action.
> Due to the complicated implementation of DAMOS core logic, this might be not
> very simple.
> 
> I think the additional complexity might outweigh the benefit for following
> reasons.  First, adaptive regions adjustment mechanism will make DAMON regions
> in same large folios to be merged soon.  So the problem will be not significant
> in common case.  Second, any threads could collapse any parts of memory into
> large folios while DAMOS is traversing DAMON regions.  So this problem cannot
> perfectly be solved unless we make DAMOS' DAMON regions traversal exclusive
> with any large folios collapsing.  Which would add more complexity.
> 
> And given DAMON's best-effort nature, keeping the issue until we get a simple
> and effective solution is ok to me, unless a real significant issue from this
> is confirmed.
> 
> Usama, what do you think?
> 
> 

So the reason I added this patch was when testing out the series on a VM that was only
running a C program continuously accessing 200M hugepages, DAMON was reporting more
than 200M of hugepages everytime, it was usually around 206-208M,
but sometimes quite high. I have added an example at the end showing 266M of hugepages,
which is more than 25% off.
Since this is page level attributes and not sampling based, I believe we should report
accurate results everytime, even at the start.

I think it makes sense when you say that adaptive region adjustment will make the regions
converge, but without this patch even after 20 minutes of running, I couldn't get 200M.
With this patch, I get 200M everytime.

I agree with the points you raised and I thought of the same when trying to come up with
this patch. The reason I did in both access check and DAMOS was I thought its best to have
consistency throughout on how folios in damon regions are treated. I think the solution
for this is probably some compromise between simplicity and accuracy.

I think we have 3 ways forward?
1. Try and improve current patch, where we ignore the folio if the head page is not part of
   the region, for both access check and DAMOS. The way we treat large folios will then be
   consistent for all cases, but there are other issues which you highlighted above.
2. Only do the approach in this patch for DAMOS schemes, i.e. use damon_get_folio_in_region
   only for damon_pa_pageout/mark_accessed_or_deactivate/migrate/stat.
   We dont do it for damon_pa_mkold/young.
3. Only do this approach for pa_stat.

My preference is going forward with option 2, as we wont do pageout/migration/etc repeatedly
to the same folio, but I think option 3 is fine as well, so that we atleast get the right
stats. Let me know what you think is the best way forward?


[root@vm4 src]# cat /proc/meminfo | grep -i anonhuge
AnonHugePages:    204800 kB
[root@vm4 src]# ./run.sh 
heatmap: 44444444888889888887333333332222000000000000000000000000000000000000000000000000
# min/max temperatures: -1,610,000,000, 366,877, column size: 25.586 MiB
# damos filters (df): reject none hugepage
0   addr 1024.000 KiB size 203.875 MiB access 0 %   age 7.500 s       df-passed 0 B
1   addr 204.875 MiB  size 5.949 MiB   access 0 %   age 200 ms        df-passed 0 B
2   addr 210.824 MiB  size 16.062 MiB  access 0 %   age 0 ns          df-passed 0 B
3   addr 226.887 MiB  size 37.480 MiB  access 0 %   age 0 ns          df-passed 18.000 MiB
4   addr 264.367 MiB  size 10.316 MiB  access 15 %  age 0 ns          df-passed 12.000 MiB
5   addr 274.684 MiB  size 4.426 MiB   access 10 %  age 0 ns          df-passed 6.000 MiB
6   addr 279.109 MiB  size 5.688 MiB   access 15 %  age 0 ns          df-passed 6.000 MiB
7   addr 284.797 MiB  size 648.000 KiB access 15 %  age 0 ns          df-passed 2.000 MiB
8   addr 285.430 MiB  size 56.000 KiB  access 10 %  age 100 ms        df-passed 2.000 MiB
9   addr 285.484 MiB  size 504.000 KiB access 10 %  age 100 ms        df-passed 2.000 MiB
10  addr 285.977 MiB  size 844.000 KiB access 5 %   age 0 ns          df-passed 2.000 MiB
11  addr 286.801 MiB  size 3.305 MiB   access 5 %   age 0 ns          df-passed 4.000 MiB
12  addr 290.105 MiB  size 6.281 MiB   access 0 %   age 100 ms        df-passed 8.000 MiB
13  addr 296.387 MiB  size 6.285 MiB   access 0 %   age 100 ms        df-passed 8.000 MiB
14  addr 302.672 MiB  size 1.211 MiB   access 5 %   age 0 ns          df-passed 2.000 MiB
15  addr 303.883 MiB  size 1.211 MiB   access 5 %   age 0 ns          df-passed 2.000 MiB
16  addr 305.094 MiB  size 840.000 KiB access 0 %   age 0 ns          df-passed 2.000 MiB
17  addr 305.914 MiB  size 364.000 KiB access 0 %   age 0 ns          df-passed 2.000 MiB
18  addr 306.270 MiB  size 844.000 KiB access 10 %  age 0 ns          df-passed 2.000 MiB
19  addr 307.094 MiB  size 364.000 KiB access 10 %  age 0 ns          df-passed 2.000 MiB
20  addr 307.449 MiB  size 8.484 MiB   access 5 %   age 0 ns          df-passed 10.000 MiB
21  addr 315.934 MiB  size 8.488 MiB   access 5 %   age 0 ns          df-passed 10.000 MiB
22  addr 324.422 MiB  size 772.000 KiB access 0 %   age 100 ms        df-passed 2.000 MiB
23  addr 325.176 MiB  size 1.133 MiB   access 0 %   age 100 ms        df-passed 2.000 MiB
24  addr 326.309 MiB  size 224.000 KiB access 0 %   age 0 ns          df-passed 2.000 MiB
25  addr 326.527 MiB  size 900.000 KiB access 0 %   age 0 ns          df-passed 2.000 MiB
26  addr 327.406 MiB  size 3.961 MiB   access 0 %   age 0 ns          df-passed 4.000 MiB
27  addr 331.367 MiB  size 5.945 MiB   access 0 %   age 0 ns          df-passed 6.000 MiB
28  addr 337.312 MiB  size 1.867 MiB   access 5 %   age 0 ns          df-passed 2.000 MiB
29  addr 339.180 MiB  size 824.000 KiB access 10 %  age 0 ns          df-passed 2.000 MiB
30  addr 339.984 MiB  size 548.000 KiB access 0 %   age 0 ns          df-passed 2.000 MiB
31  addr 340.520 MiB  size 2.141 MiB   access 0 %   age 0 ns          df-passed 4.000 MiB
32  addr 342.660 MiB  size 2.617 MiB   access 0 %   age 0 ns          df-passed 4.000 MiB
33  addr 345.277 MiB  size 1.125 MiB   access 0 %   age 0 ns          df-passed 2.000 MiB
34  addr 346.402 MiB  size 984.000 KiB access 5 %   age 0 ns          df-passed 2.000 MiB
35  addr 347.363 MiB  size 660.000 KiB access 5 %   age 0 ns          df-passed 2.000 MiB
36  addr 348.008 MiB  size 768.000 KiB access 15 %  age 100 ms        df-passed 2.000 MiB
37  addr 348.758 MiB  size 192.000 KiB access 15 %  age 100 ms        df-passed 2.000 MiB
38  addr 348.945 MiB  size 1.098 MiB   access 5 %   age 0 ns          df-passed 2.000 MiB
39  addr 350.043 MiB  size 1.102 MiB   access 5 %   age 0 ns          df-passed 2.000 MiB
40  addr 351.145 MiB  size 768.000 KiB access 0 %   age 0 ns          df-passed 2.000 MiB
41  addr 351.895 MiB  size 516.000 KiB access 0 %   age 0 ns          df-passed 2.000 MiB
42  addr 352.398 MiB  size 2.258 MiB   access 10 %  age 0 ns          df-passed 4.000 MiB
43  addr 354.656 MiB  size 9.039 MiB   access 10 %  age 0 ns          df-passed 10.000 MiB
44  addr 363.695 MiB  size 38.391 MiB  access 0 %   age 0 ns          df-passed 40.000 MiB
45  addr 402.086 MiB  size 16.457 MiB  access 0 %   age 0 ns          df-passed 18.000 MiB
46  addr 418.543 MiB  size 2.168 MiB   access 5 %   age 0 ns          df-passed 4.000 MiB
47  addr 420.711 MiB  size 3.258 MiB   access 5 %   age 0 ns          df-passed 4.000 MiB
48  addr 423.969 MiB  size 124.000 KiB access 10 %  age 100 ms        df-passed 2.000 MiB
49  addr 424.090 MiB  size 32.000 KiB  access 10 %  age 100 ms        df-passed 2.000 MiB
50  addr 424.121 MiB  size 1.469 MiB   access 5 %   age 0 ns          df-passed 2.000 MiB
51  addr 425.590 MiB  size 380.000 KiB access 5 %   age 0 ns          df-passed 2.000 MiB
52  addr 425.961 MiB  size 752.000 KiB access 10 %  age 0 ns          df-passed 2.000 MiB
53  addr 426.695 MiB  size 1.109 MiB   access 10 %  age 0 ns          df-passed 2.000 MiB
54  addr 427.805 MiB  size 54.031 MiB  access 0 %   age 0 ns          df-passed 22.000 MiB
55  addr 481.836 MiB  size 23.160 MiB  access 0 %   age 0 ns          df-passed 0 B
56  addr 504.996 MiB  size 194.918 MiB access 0 %   age 9 s           df-passed 0 B
57  addr 699.914 MiB  size 118.688 MiB access 0 %   age 11.500 s      df-passed 0 B
58  addr 818.602 MiB  size 199.594 MiB access 0 %   age 15.300 s      df-passed 0 B
59  addr 1018.195 MiB size 1.006 GiB   access 0 %   age 16.100 s      df-passed 0 B
total size: 1.999 GiB  df-passed: 266.000 MiB
[root@vm4 src]# cat /proc/meminfo | grep -i anonhuge
AnonHugePages:    204800 kB
diff mbox series

Patch

diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index 0fb61f6ddb8d..3f59a3fdc391 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -19,6 +19,30 @@ 
 #include "../internal.h"
 #include "ops-common.h"
 
+/*
+ * Get an online page for a paddr if it's in the LRU list and if the head page is
+ * before region_start. Otherwise, returns NULL.
+ */
+static struct folio *damon_get_folio_in_region(unsigned long addr, unsigned long region_start)
+{
+	struct page *page = pfn_to_online_page(PHYS_PFN(addr));
+	struct folio *folio;
+
+	if (!page)
+		return NULL;
+
+	folio = page_folio(page);
+	if (addr - folio_page_idx(folio, page) * PAGE_SIZE < region_start)
+		return NULL;
+	if (!folio_test_lru(folio) || !folio_try_get(folio))
+		return NULL;
+	if (unlikely(page_folio(page) != folio || !folio_test_lru(folio))) {
+		folio_put(folio);
+		folio = NULL;
+	}
+	return folio;
+}
+
 static bool damon_folio_mkold_one(struct folio *folio,
 		struct vm_area_struct *vma, unsigned long addr, void *arg)
 {
@@ -58,9 +82,9 @@  static void damon_folio_mkold(struct folio *folio)
 
 }
 
-static void damon_pa_mkold(unsigned long paddr)
+static void damon_pa_mkold(unsigned long paddr, unsigned long region_start)
 {
-	struct folio *folio = damon_get_folio(PHYS_PFN(paddr));
+	struct folio *folio = damon_get_folio_in_region(paddr, region_start);
 
 	if (!folio)
 		return;
@@ -73,7 +97,7 @@  static void __damon_pa_prepare_access_check(struct damon_region *r)
 {
 	r->sampling_addr = damon_rand(r->ar.start, r->ar.end);
 
-	damon_pa_mkold(r->sampling_addr);
+	damon_pa_mkold(r->sampling_addr, r->ar.start);
 }
 
 static void damon_pa_prepare_access_checks(struct damon_ctx *ctx)
@@ -148,9 +172,9 @@  static bool damon_folio_young(struct folio *folio)
 	return accessed;
 }
 
-static bool damon_pa_young(unsigned long paddr, unsigned long *folio_sz)
+static bool damon_pa_young(unsigned long paddr, unsigned long *folio_sz, unsigned long region_start)
 {
-	struct folio *folio = damon_get_folio(PHYS_PFN(paddr));
+	struct folio *folio = damon_get_folio_in_region(paddr, region_start);
 	bool accessed;
 
 	if (!folio)
@@ -176,7 +200,7 @@  static void __damon_pa_check_access(struct damon_region *r,
 		return;
 	}
 
-	last_accessed = damon_pa_young(r->sampling_addr, &last_folio_sz);
+	last_accessed = damon_pa_young(r->sampling_addr, &last_folio_sz, r->ar.start);
 	damon_update_region_access_rate(r, last_accessed, attrs);
 
 	last_addr = r->sampling_addr;
@@ -268,7 +292,7 @@  static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s,
 
 	addr = r->ar.start;
 	while (addr < r->ar.end) {
-		struct folio *folio = damon_get_folio(PHYS_PFN(addr));
+		struct folio *folio = damon_get_folio_in_region(addr, r->ar.start);
 
 		if (!folio) {
 			addr += PAGE_SIZE;
@@ -307,7 +331,7 @@  static inline unsigned long damon_pa_mark_accessed_or_deactivate(
 
 	addr = r->ar.start;
 	while (addr < r->ar.end) {
-		struct folio *folio = damon_get_folio(PHYS_PFN(addr));
+		struct folio *folio = damon_get_folio_in_region(addr, r->ar.start);
 
 		if (!folio) {
 			addr += PAGE_SIZE;
@@ -474,7 +498,7 @@  static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
 
 	addr = r->ar.start;
 	while (addr < r->ar.end) {
-		struct folio *folio = damon_get_folio(PHYS_PFN(addr));
+		struct folio *folio = damon_get_folio_in_region(addr, r->ar.start);
 
 		if (!folio) {
 			addr += PAGE_SIZE;
@@ -518,7 +542,7 @@  static unsigned long damon_pa_stat(struct damon_region *r, struct damos *s,
 
 	addr = r->ar.start;
 	while (addr < r->ar.end) {
-		struct folio *folio = damon_get_folio(PHYS_PFN(addr));
+		struct folio *folio = damon_get_folio_in_region(addr, r->ar.start);
 
 		if (!folio) {
 			addr += PAGE_SIZE;