mm: remove unnecessary calls to lru_add_drain

Message ID	20241219153253.3da9e8aa@fangorn (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> Date: Thu, 19 Dec 2024 15:32:53 -0500 From: Rik van Riel <riel@surriel.com> To: Shakeel Butt <shakeel.butt@linux.dev> Cc: David Hildenbrand <david@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, Chris Li <chrisl@kernel.org>, Ryan Roberts <ryan.roberts@arm.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org Subject: [PATCH] mm: remove unnecessary calls to lru_add_drain Message-ID: <20241219153253.3da9e8aa@fangorn> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: remove unnecessary calls to lru_add_drain \| expand mm: remove unnecessary calls to lru_add_drain

Message ID

20241219153253.3da9e8aa@fangorn (mailing list archive)

State

New

Headers

Date: Thu, 19 Dec 2024 15:32:53 -0500
From: Rik van Riel <riel@surriel.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: David Hildenbrand <david@redhat.com>, Andrew Morton
 <akpm@linux-foundation.org>, Chris Li <chrisl@kernel.org>, Ryan Roberts
 <ryan.roberts@arm.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>,
 linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org
Subject: [PATCH] mm: remove unnecessary calls to lru_add_drain
Message-ID: <20241219153253.3da9e8aa@fangorn>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm: remove unnecessary calls to lru_add_drain | expand

Commit Message

Rik van Riel Dec. 19, 2024, 8:32 p.m. UTC

There seem to be several categories of calls to lru_add_drain
and lru_add_drain_all.

The first are code paths that recently allocated, swapped in,
or otherwise processed a batch of pages, and want them all on
the LRU. These drain pages that were recently allocated,
probably on the local CPU.

A second category are code paths that are actively trying to
reclaim, migrate, or offline memory. These often use lru_add_drain_all,
to drain the caches on all CPUs.

However, there also seem to be some other callers where we
aren't really doing either. They are calling lru_add_drain(),
despite operating on pages that may have been allocated
long ago, and quite possibly on different CPUs.

Those calls are not likely to be effective at anything but
creating lock contention on the LRU locks.

Remove the lru_add_drain calls in the latter category.

Signed-off-by: Rik van Riel <riel@surriel.com>
Suggested-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c     | 1 -
 mm/mmap.c       | 2 --
 mm/swap_state.c | 1 -
 mm/vma.c        | 2 --
 4 files changed, 6 deletions(-)

Comments

Andrew Morton Dec. 19, 2024, 10:14 p.m. UTC | #1

On Thu, 19 Dec 2024 15:32:53 -0500 Rik van Riel <riel@surriel.com> wrote:

> There seem to be several categories of calls to lru_add_drain
> and lru_add_drain_all.
> 
> The first are code paths that recently allocated, swapped in,
> or otherwise processed a batch of pages, and want them all on
> the LRU. These drain pages that were recently allocated,
> probably on the local CPU.
> 
> A second category are code paths that are actively trying to
> reclaim, migrate, or offline memory. These often use lru_add_drain_all,
> to drain the caches on all CPUs.
> 
> However, there also seem to be some other callers where we
> aren't really doing either. They are calling lru_add_drain(),
> despite operating on pages that may have been allocated
> long ago, and quite possibly on different CPUs.
> 
> Those calls are not likely to be effective at anything but
> creating lock contention on the LRU locks.
> 
> Remove the lru_add_drain calls in the latter category.

These lru_add_drain() calls are the sorts of things we've added as
bugfixes when things go weird in unexpected situations.  So the need
for them can be obscure.

I'd be more comfortable if we'd gone through them all, hunted down the
commits which added them, learned why these calls were added then
explained why that reasoning is no longer valid.

A lot of the ones you're removing precede a tlb_gather_mmu() operation.
I wonder why we have (or had) that pattern?

Rik van Riel Dec. 20, 2024, 2 a.m. UTC | #2

On Thu, 2024-12-19 at 14:14 -0800, Andrew Morton wrote:
> On Thu, 19 Dec 2024 15:32:53 -0500 Rik van Riel <riel@surriel.com>
> wrote:
> 
> > There seem to be several categories of calls to lru_add_drain
> > and lru_add_drain_all.
> > 
> > The first are code paths that recently allocated, swapped in,
> > or otherwise processed a batch of pages, and want them all on
> > the LRU. These drain pages that were recently allocated,
> > probably on the local CPU.
> > 
> > A second category are code paths that are actively trying to
> > reclaim, migrate, or offline memory. These often use
> > lru_add_drain_all,
> > to drain the caches on all CPUs.
> > 
> > However, there also seem to be some other callers where we
> > aren't really doing either. They are calling lru_add_drain(),
> > despite operating on pages that may have been allocated
> > long ago, and quite possibly on different CPUs.
> > 
> > Those calls are not likely to be effective at anything but
> > creating lock contention on the LRU locks.
> > 
> > Remove the lru_add_drain calls in the latter category.
> 
> These lru_add_drain() calls are the sorts of things we've added as
> bugfixes when things go weird in unexpected situations.  So the need
> for them can be obscure.
> 
> I'd be more comfortable if we'd gone through them all, hunted down
> the
> commits which added them, learned why these calls were added then
> explained why that reasoning is no longer valid.
> 
> 
> A lot of the ones you're removing precede a tlb_gather_mmu()
> operation.
> I wonder why we have (or had) that pattern?
> 

It goes all the way back to before we were using git.

In those days, computers had fewer CPUs, and much less
memory. Maybe 2-4 CPUs and 64-256 MB of memory?

That means the value of freeing pages from lru_add_drain
in more places is larger, and the chance of the local
CPU holding relevant pages in its pagevecs would have
been larger, too.

On a system with 16 CPUs and 64GB of memory, the cost
of flushing the pagevecs more frequently is higher,
while the chance of encountering the right pages, and
the memory benefit are both lower.

I did not find any changeset in git history where we
had an existing tlb_gather_mmu, and an lru_add_drain
was added in front of it.

I did find some other things in 18 years of
"git log -S lru_add_drain", though :)


I found one place in git history where lru_add_drain
and tlb_gather_mmu are added together, like:

f5cc4eef9987 ("VM: make zap_page_range() callers that act on a single
VMA use separate helper")


I also found some changesets where unnecessary 
lru_add_drain calls are removed:

67e4eb076840 ("mm: thp: don't need to drain lru cache when splitting
and mlocking THP")
72b03fcd5d51 ("mm: mlock: remove lru_add_drain_all()")
586a32ac1d33 ("mm: munlock: remove unnecessary call to
lru_add_drain()")


Since 2006, most of the places that add lruvec flushing
seem to have added calls to lru_add_drain_all, for example:

7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage
collapse")
a980df33e935 ("khugepaged: drain all LRU caches before scanning pages")
9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages
allocated from CMA region")

Shakeel Butt Dec. 21, 2024, 1:10 a.m. UTC | #3

I am trying to go beyond the first git commit to find the actual
motivation for adding these lru_add_drain.

On Thu, Dec 19, 2024 at 03:32:53PM -0500, Rik van Riel wrote:
> There seem to be several categories of calls to lru_add_drain
> and lru_add_drain_all.
> 
> The first are code paths that recently allocated, swapped in,
> or otherwise processed a batch of pages, and want them all on
> the LRU. These drain pages that were recently allocated,
> probably on the local CPU.
> 
> A second category are code paths that are actively trying to
> reclaim, migrate, or offline memory. These often use lru_add_drain_all,
> to drain the caches on all CPUs.
> 
> However, there also seem to be some other callers where we
> aren't really doing either. They are calling lru_add_drain(),
> despite operating on pages that may have been allocated
> long ago, and quite possibly on different CPUs.
> 
> Those calls are not likely to be effective at anything but
> creating lock contention on the LRU locks.
> 
> Remove the lru_add_drain calls in the latter category.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memory.c     | 1 -
>  mm/mmap.c       | 2 --
>  mm/swap_state.c | 1 -
>  mm/vma.c        | 2 --
>  4 files changed, 6 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 75c2dfd04f72..95ce298dc254 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1935,7 +1935,6 @@ void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
>  	struct mmu_notifier_range range;
>  	struct mmu_gather tlb;
>  
> -	lru_add_drain();

The above was added in [1]. It seems like the motivation was that the
lru_add_cache was holding to some freed pages for long period of time
and some workload (AIM9) was suffering to go to reclaim to flush those
pages and use them. By draining here, such a workload was going to
reclaim less often. (I kind of extrapolate the reasoning).

I think now it is ok to remove this draining as ratio of pages getting
stuck in such a cache to the total RAM is drastically reduced and these
pages becoming the main cause of slowdown is almost zero.

[1] https://github.com/mpe/linux-fullhistory/commit/15317018be190db05f7420f27afd3d053aad48b5

>  	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
>  				address, end);
>  	hugetlb_zap_begin(vma, &range.start, &range.end);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d32b7e701058..ef57488f1020 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1660,7 +1660,6 @@ void exit_mmap(struct mm_struct *mm)
>  		goto destroy;
>  	}
>  
> -	lru_add_drain();

The above was added in [2]. I think it was just a move from the callee
to multiple callers i.e. from unmap_page_range() to unmap_region() and
exit_mmap(). For unmap_page_range(), lru_add_drain() was added in [1].
So, I think the same reason can apply to it i.e. we can remove it now.

[2] https://github.com/mpe/linux-fullhistory/commit/5b0aee25a3c09b7c4fbb52a737fc9f8ec6761079

>  	flush_cache_mm(mm);
>  	tlb_gather_mmu_fullmm(&tlb, mm);
>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
> @@ -2103,7 +2102,6 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
>  				       vma, new_start, length, false, true))
>  		return -ENOMEM;
>  
> -	lru_add_drain();

The above was added by commit b6a2fea39318e ("mm: variable length
argument support"). From what I see, there was no reason provided and I
couldn't find on lkml any discussion on this. I think it was just
following a pattern from somewhere else of lru_add_drain() along with
tlb_gather_mmu().

I think we can remove this one as well.

>  	tlb_gather_mmu(&tlb, mm);
>  	next = vma_next(&vmi);
>  	if (new_end > old_start) {
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index e0c0321b8ff7..ca42b2be64d9 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -317,7 +317,6 @@ void free_pages_and_swap_cache(struct encoded_page **pages, int nr)
>  	struct folio_batch folios;
>  	unsigned int refs[PAGEVEC_SIZE];
>  
> -	lru_add_drain();

This was added in [1] as well and I think the same reason applies.

>  	folio_batch_init(&folios);
>  	for (int i = 0; i < nr; i++) {
>  		struct folio *folio = page_folio(encoded_page_ptr(pages[i]));
> diff --git a/mm/vma.c b/mm/vma.c
> index 8e31b7e25aeb..d84e5ef6d15b 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -398,7 +398,6 @@ void unmap_region(struct ma_state *mas, struct vm_area_struct *vma,
>  	struct mm_struct *mm = vma->vm_mm;
>  	struct mmu_gather tlb;
>  
> -	lru_add_drain();

Same reason as for exit_mmap().

>  	tlb_gather_mmu(&tlb, mm);
>  	update_hiwater_rss(mm);
>  	unmap_vmas(&tlb, mas, vma, vma->vm_start, vma->vm_end, vma->vm_end,
> @@ -1130,7 +1129,6 @@ static inline void vms_clear_ptes(struct vma_munmap_struct *vms,
>  	 * were isolated before we downgraded mmap_lock.
>  	 */
>  	mas_set(mas_detach, 1);
> -	lru_add_drain();

This is from 9c3ebeda8fb5a ("mm/vma: track start and end for munmap in
vma_munmap_struct") and I think it was also just following the pattern.
I think we can remove this one as well.

>  	tlb_gather_mmu(&tlb, vms->vma->vm_mm);
>  	update_hiwater_rss(vms->vma->vm_mm);
>  	unmap_vmas(&tlb, mas_detach, vms->vma, vms->start, vms->end,

I hope this much history is good enough to convince Andrew. With that
please add:

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>

David Hildenbrand Dec. 21, 2024, 3:32 p.m. UTC | #4

> 
>>   	tlb_gather_mmu(&tlb, vms->vma->vm_mm);
>>   	update_hiwater_rss(vms->vma->vm_mm);
>>   	unmap_vmas(&tlb, mas_detach, vms->vma, vms->start, vms->end,
> 
> I hope this much history is good enough to convince Andrew. With that
> please add:
> 
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> 

Yes, seems reasonable to me. Does it matter on small (32bit?) machines? 
Hard to judge, but being too careful here likely does not make sense 
until we learn that there is a real problem.

Acked-by: David Hildenbrand <david@redhat.com>

diff --git a/mm/memory.c b/mm/memory.c
index 75c2dfd04f72..95ce298dc254 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1935,7 +1935,6 @@  void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
 	struct mmu_notifier_range range;
 	struct mmu_gather tlb;
 
-	lru_add_drain();
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
 				address, end);
 	hugetlb_zap_begin(vma, &range.start, &range.end);
diff --git a/mm/mmap.c b/mm/mmap.c
index d32b7e701058..ef57488f1020 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1660,7 +1660,6 @@  void exit_mmap(struct mm_struct *mm)
 		goto destroy;
 	}
 
-	lru_add_drain();
 	flush_cache_mm(mm);
 	tlb_gather_mmu_fullmm(&tlb, mm);
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
@@ -2103,7 +2102,6 @@  int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
 				       vma, new_start, length, false, true))
 		return -ENOMEM;
 
-	lru_add_drain();
 	tlb_gather_mmu(&tlb, mm);
 	next = vma_next(&vmi);
 	if (new_end > old_start) {
diff --git a/mm/swap_state.c b/mm/swap_state.c
index e0c0321b8ff7..ca42b2be64d9 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -317,7 +317,6 @@  void free_pages_and_swap_cache(struct encoded_page **pages, int nr)
 	struct folio_batch folios;
 	unsigned int refs[PAGEVEC_SIZE];
 
-	lru_add_drain();
 	folio_batch_init(&folios);
 	for (int i = 0; i < nr; i++) {
 		struct folio *folio = page_folio(encoded_page_ptr(pages[i]));
diff --git a/mm/vma.c b/mm/vma.c
index 8e31b7e25aeb..d84e5ef6d15b 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -398,7 +398,6 @@  void unmap_region(struct ma_state *mas, struct vm_area_struct *vma,
 	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_gather tlb;
 
-	lru_add_drain();
 	tlb_gather_mmu(&tlb, mm);
 	update_hiwater_rss(mm);
 	unmap_vmas(&tlb, mas, vma, vma->vm_start, vma->vm_end, vma->vm_end,
@@ -1130,7 +1129,6 @@  static inline void vms_clear_ptes(struct vma_munmap_struct *vms,
 	 * were isolated before we downgraded mmap_lock.
 	 */
 	mas_set(mas_detach, 1);
-	lru_add_drain();
 	tlb_gather_mmu(&tlb, vms->vma->vm_mm);
 	update_hiwater_rss(vms->vma->vm_mm);
 	unmap_vmas(&tlb, mas_detach, vms->vma, vms->start, vms->end,

mm: remove unnecessary calls to lru_add_drain

Commit Message

Comments

Patch