diff mbox series

[v2,1/8] mm: multi-gen LRU: ignore non-leaf pmd_young for force_scan=true

Message ID 20240604020549.1017540-2-yuanchu@google.com (mailing list archive)
State New
Headers show
Series mm: workingset reporting | expand

Commit Message

Yuanchu Xie June 4, 2024, 2:05 a.m. UTC
When non-leaf pmd accessed bits are available, MGLRU page table walks
can clear the non-leaf pmd accessed bit and ignore the accessed bit on
the pte if it's on a different node, skipping a generation update as
well. If another scan occurrs on the same node as said skipped pte.
the non-leaf pmd accessed bit might remain cleared and the pte accessed
bits won't be checked. While this is sufficient for reclaim-driven
aging, where the goal is to select a reasonably cold page, the access
can be missed when aging proactively for workingset estimation of a of a
node/memcg.

In more detail, get_pfn_folio returns NULL if the folio's nid != node
under scanning, so the page table walk skips processing of said pte. Now
the pmd_young flag on this pmd is cleared, and if none of the pte's are
accessed before another scan occurrs on the folio's node, the pmd_young
check fails and the pte accessed bit is skipped.

Since force_scan disables various other optimizations, we check
force_scan to ignore the non-leaf pmd accessed bit.

Signed-off-by: Yuanchu Xie <yuanchu@google.com>
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Lance Yang June 4, 2024, 3:56 a.m. UTC | #1
Hi Yuanchu,

Just a few nits below ;)

On Tue, Jun 4, 2024 at 10:06 AM Yuanchu Xie <yuanchu@google.com> wrote:
>
> When non-leaf pmd accessed bits are available, MGLRU page table walks
> can clear the non-leaf pmd accessed bit and ignore the accessed bit on
> the pte if it's on a different node, skipping a generation update as
> well. If another scan occurrs on the same node as said skipped pte.

s/occurrs/occurs

> the non-leaf pmd accessed bit might remain cleared and the pte accessed
> bits won't be checked. While this is sufficient for reclaim-driven
> aging, where the goal is to select a reasonably cold page, the access
> can be missed when aging proactively for workingset estimation of a of a

s/of a of a/of a

> node/memcg.
>
> In more detail, get_pfn_folio returns NULL if the folio's nid != node
> under scanning, so the page table walk skips processing of said pte. Now
> the pmd_young flag on this pmd is cleared, and if none of the pte's are
> accessed before another scan occurrs on the folio's node, the pmd_young

s/occurrs/occurs

Thanks,
Lance

> check fails and the pte accessed bit is skipped.
>
> Since force_scan disables various other optimizations, we check
> force_scan to ignore the non-leaf pmd accessed bit.
>
> Signed-off-by: Yuanchu Xie <yuanchu@google.com>
> ---
>  mm/vmscan.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index d55e8d07ffc4..73f3718b33f7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3548,7 +3548,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
>
>                 walk->mm_stats[MM_NONLEAF_TOTAL]++;
>
> -               if (should_clear_pmd_young()) {
> +               if (!walk->force_scan && should_clear_pmd_young()) {
>                         if (!pmd_young(val))
>                                 continue;
>
> --
> 2.45.1.467.gbab1589fc0-goog
>
>
Yu Zhao July 10, 2024, 5:59 p.m. UTC | #2
On Mon, Jun 3, 2024 at 8:06 PM Yuanchu Xie <yuanchu@google.com> wrote:
>
> When non-leaf pmd accessed bits are available, MGLRU page table walks
> can clear the non-leaf pmd accessed bit and ignore the accessed bit on
> the pte if it's on a different node, skipping a generation update as
> well. If another scan occurrs on the same node as said skipped pte.
> the non-leaf pmd accessed bit might remain cleared and the pte accessed
> bits won't be checked. While this is sufficient for reclaim-driven
> aging, where the goal is to select a reasonably cold page, the access
> can be missed when aging proactively for workingset estimation of a of a
> node/memcg.
>
> In more detail, get_pfn_folio returns NULL if the folio's nid != node
> under scanning, so the page table walk skips processing of said pte. Now
> the pmd_young flag on this pmd is cleared, and if none of the pte's are
> accessed before another scan occurrs on the folio's node, the pmd_young
> check fails and the pte accessed bit is skipped.
>
> Since force_scan disables various other optimizations, we check
> force_scan to ignore the non-leaf pmd accessed bit.
>
> Signed-off-by: Yuanchu Xie <yuanchu@google.com>
> ---
>  mm/vmscan.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index d55e8d07ffc4..73f3718b33f7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3548,7 +3548,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
>
>                 walk->mm_stats[MM_NONLEAF_TOTAL]++;
>
> -               if (should_clear_pmd_young()) {
> +               if (!walk->force_scan && should_clear_pmd_young()) {
>                         if (!pmd_young(val))
>                                 continue;

What about the other should_clear_pmd_young() in walk_pmd_range_locked()?

With that and the typos fixed, we should probably split this patch
out, since it can get reviewed and merged independently.
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index d55e8d07ffc4..73f3718b33f7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3548,7 +3548,7 @@  static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
 
 		walk->mm_stats[MM_NONLEAF_TOTAL]++;
 
-		if (should_clear_pmd_young()) {
+		if (!walk->force_scan && should_clear_pmd_young()) {
 			if (!pmd_young(val))
 				continue;