[RFC,18/18] mm: Add pfn_range_put()

Message ID	20230830185041.3427464-4-willy@infradead.org (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: linux-mm@kvack.org Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, Ryan Roberts <ryan.roberts@arm.com> Subject: [RFC PATCH 18/18] mm: Add pfn_range_put() Date: Wed, 30 Aug 2023 19:50:41 +0100 Message-Id: <20230830185041.3427464-4-willy@infradead.org> In-Reply-To: <20230825135918.4164671-1-willy@infradead.org> References: <20230825135918.4164671-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Rearrange batched folio freeing \| expand [RFC,00/14] Rearrange batched folio freeing [RFC,01/14] mm: Make folios_put() the basis of release_pages() [RFC,02/14] mm: Convert free_unref_page_list() to use folios [RFC,03/14] mm: Add free_unref_folios() [RFC,04/14] mm: Use folios_put() in __folio_batch_release() [RFC,05/14] memcg: Add mem_cgroup_uncharge_folios() [RFC,06/14] mm: Remove use of folio list from folios_put() [RFC,07/14] mm: Use free_unref_folios() in put_pages_list() [RFC,08/14] mm: use __page_cache_release() in folios_put() [RFC,09/14] mm: Handle large folios in free_unref_folios() [RFC,10/14] mm: Allow non-hugetlb large folios to be batch processed [RFC,11/14] mm: Free folios in a batch in shrink_folio_list() [RFC,12/14] mm: Free folios directly in move_folios_to_lru() [RFC,13/14] memcg: Remove mem_cgroup_uncharge_list() [RFC,14/14] mm: Remove free_unref_page_list() [RFC,15/18] mm: Convert free_pages_and_swap_cache() to use folios_put() [RFC,16/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() [RFC,17/18] mm: Convert free_swap_cache() to take a folio [RFC,18/18] mm: Add pfn_range_put()

Message ID

20230830185041.3427464-4-willy@infradead.org (mailing list archive)

State

New

Headers

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
To: linux-mm@kvack.org
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Ryan Roberts <ryan.roberts@arm.com>
Subject: [RFC PATCH 18/18] mm: Add pfn_range_put()
Date: Wed, 30 Aug 2023 19:50:41 +0100
Message-Id: <20230830185041.3427464-4-willy@infradead.org>
In-Reply-To: <20230825135918.4164671-1-willy@infradead.org>
References: <20230825135918.4164671-1-willy@infradead.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

Rearrange batched folio freeing | expand

This function will be used imminently. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- include/linux/mm.h | 7 ++++++ mm/swap.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 63 insertions(+)

Comments

Ryan Roberts Aug. 31, 2023, 7:03 p.m. UTC | #1

On 30/08/2023 19:50, Matthew Wilcox (Oracle) wrote:
> This function will be used imminently.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/mm.h |  7 ++++++
>  mm/swap.c          | 56 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 63 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a517ef8d2386..4f9e2cfb372e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1515,6 +1515,13 @@ typedef union {
>  void release_pages(release_pages_arg, int nr);
>  void folios_put(struct folio_batch *folios);
>  
> +struct pfn_range {
> +	unsigned long start;
> +	unsigned long end;
> +};

As mentioned in the other thread, I think we need to better convey that a
pfn_range must be fully contained within a single folio. Perhaps its better to
call it `struct folio_region`?

> +
> +void pfn_range_put(struct pfn_range *range, unsigned int nr);

If going with `struct folio_region`, we could call this folios_put_refs()?

Or if you hate that idea and want to stick with pfn, then at least call it
pfn_ranges_put() (with s on ranges).

> +
>  static inline void put_page(struct page *page)
>  {
>  	struct folio *folio = page_folio(page);
> diff --git a/mm/swap.c b/mm/swap.c
> index 8bd15402cd8f..218d2cc4c6f4 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -1027,6 +1027,62 @@ void folios_put(struct folio_batch *folios)
>  }
>  EXPORT_SYMBOL(folios_put);
>  
> +void pfn_range_put(struct pfn_range *range, unsigned int nr)
> +{
> +	struct folio_batch folios;
> +	unsigned int i;
> +	struct lruvec *lruvec = NULL;
> +	unsigned long flags = 0;
> +
> +	folio_batch_init(&folios);
> +	for (i = 0; i < nr; i++) {
> +		struct folio *folio = pfn_folio(range[i].start);
> +		unsigned int refs = range[i].end - range[i].start;

Don't you need to check for huge zero page like in folios_put()?

		if (is_huge_zero_page(&folio->page))
			continue;

> +
> +		if (folio_is_zone_device(folio)) {
> +			if (lruvec) {
> +				unlock_page_lruvec_irqrestore(lruvec, flags);
> +				lruvec = NULL;
> +			}
> +			if (put_devmap_managed_page_refs(&folio->page, refs))
> +				continue;
> +			if (folio_ref_sub_and_test(folio, refs))
> +				free_zone_device_page(&folio->page);
> +			continue;
> +		}
> +
> +		if (!folio_ref_sub_and_test(folio, refs))
> +			continue;
> +
> +		/* hugetlb has its own memcg */
> +		if (folio_test_hugetlb(folio)) {
> +			if (lruvec) {
> +				unlock_page_lruvec_irqrestore(lruvec, flags);
> +				lruvec = NULL;
> +			}
> +			free_huge_folio(folio);
> +			continue;
> +		}
> +
> +		__page_cache_release(folio, &lruvec, &flags);
> +		if (folio_batch_add(&folios, folio) == 0) {
> +			if (lruvec) {
> +				unlock_page_lruvec_irqrestore(lruvec, flags);
> +				lruvec = NULL;
> +			}
> +			mem_cgroup_uncharge_folios(&folios);
> +			free_unref_folios(&folios);
> +		}
> +	}
> +	if (lruvec)
> +		unlock_page_lruvec_irqrestore(lruvec, flags);
> +
> +	if (folios.nr) {
> +		mem_cgroup_uncharge_folios(&folios);
> +		free_unref_folios(&folios);
> +	}
> +}

This still duplicates a lot of the logic in folios_put(), but I see you have an
idea in the other thread for improving this situation - I'll reply in the
context of that thread.

But overall this looks great - thanks for taking the time to put this together -
it will integrate nicely with my mmugather series.

Thanks,
Ryan


> +
>  /**
>   * release_pages - batched put_page()
>   * @arg: array of pages to release

Matthew Wilcox Sept. 1, 2023, 4:27 a.m. UTC | #2

On Thu, Aug 31, 2023 at 08:03:14PM +0100, Ryan Roberts wrote:
> > +++ b/include/linux/mm.h
> > @@ -1515,6 +1515,13 @@ typedef union {
> >  void release_pages(release_pages_arg, int nr);
> >  void folios_put(struct folio_batch *folios);
> >  
> > +struct pfn_range {
> > +	unsigned long start;
> > +	unsigned long end;
> > +};
> 
> As mentioned in the other thread, I think we need to better convey that a
> pfn_range must be fully contained within a single folio. Perhaps its better to
> call it `struct folio_region`?

Yep, I'm not taking a strong position on that.  We can call it whatever
you like, subject to usual nitpicking later.

> > +
> > +void pfn_range_put(struct pfn_range *range, unsigned int nr);
> 
> If going with `struct folio_region`, we could call this folios_put_refs()?
> 
> Or if you hate that idea and want to stick with pfn, then at least call it
> pfn_ranges_put() (with s on ranges).

folio_regions_put()?  I don't know at this point; nothing's speaking
to me.

> > +void pfn_range_put(struct pfn_range *range, unsigned int nr)
> > +{
> > +	struct folio_batch folios;
> > +	unsigned int i;
> > +	struct lruvec *lruvec = NULL;
> > +	unsigned long flags = 0;
> > +
> > +	folio_batch_init(&folios);
> > +	for (i = 0; i < nr; i++) {
> > +		struct folio *folio = pfn_folio(range[i].start);
> > +		unsigned int refs = range[i].end - range[i].start;
> 
> Don't you need to check for huge zero page like in folios_put()?
> 
> 		if (is_huge_zero_page(&folio->page))
> 			continue;

Maybe?  Can we put the huge zero page in here, or would we filter it
out earlier?

> > +	if (lruvec)
> > +		unlock_page_lruvec_irqrestore(lruvec, flags);
> > +
> > +	if (folios.nr) {
> > +		mem_cgroup_uncharge_folios(&folios);
> > +		free_unref_folios(&folios);
> > +	}
> > +}
> 
> This still duplicates a lot of the logic in folios_put(), but I see you have an
> idea in the other thread for improving this situation - I'll reply in the
> context of that thread.
> 
> But overall this looks great - thanks for taking the time to put this together -
> it will integrate nicely with my mmugather series.
> 

Thanks!  One thing I was wondering; intended to look at today, but didn't
have time for it -- can we use mmugather to solve how vmscan wants to
handle batched TLB IPIs for mapped pages, instead of having its own thing?
72b252aed506 is the commit to look at.

Ryan Roberts Sept. 1, 2023, 7:59 a.m. UTC | #3

On 01/09/2023 05:27, Matthew Wilcox wrote:
> On Thu, Aug 31, 2023 at 08:03:14PM +0100, Ryan Roberts wrote:
>>> +++ b/include/linux/mm.h
>>> @@ -1515,6 +1515,13 @@ typedef union {
>>>  void release_pages(release_pages_arg, int nr);
>>>  void folios_put(struct folio_batch *folios);
>>>  
>>> +struct pfn_range {
>>> +	unsigned long start;
>>> +	unsigned long end;
>>> +};
>>
>> As mentioned in the other thread, I think we need to better convey that a
>> pfn_range must be fully contained within a single folio. Perhaps its better to
>> call it `struct folio_region`?
> 
> Yep, I'm not taking a strong position on that.  We can call it whatever
> you like, subject to usual nitpicking later.
> 
>>> +
>>> +void pfn_range_put(struct pfn_range *range, unsigned int nr);
>>
>> If going with `struct folio_region`, we could call this folios_put_refs()?
>>
>> Or if you hate that idea and want to stick with pfn, then at least call it
>> pfn_ranges_put() (with s on ranges).
> 
> folio_regions_put()?  I don't know at this point; nothing's speaking
> to me.

I'm inclined to call it `struct folio_region` and
free_folio_regions_and_swap_cache(). Then it replaces
free_pages_and_swap_cache() (as per suggestion against your patch set). For now,
I'll implement free_folio_regions_and_swap_cache() similarly to release_pages()
+ free_swap_cache(). Then your patch set just has to reimplement
free_folio_regions_and_swap_cache() using the folio_batch approach. Shout if
that sounds problematic. Otherwise I'll post a new version of my patch set later
today.

> 
>>> +void pfn_range_put(struct pfn_range *range, unsigned int nr)
>>> +{
>>> +	struct folio_batch folios;
>>> +	unsigned int i;
>>> +	struct lruvec *lruvec = NULL;
>>> +	unsigned long flags = 0;
>>> +
>>> +	folio_batch_init(&folios);
>>> +	for (i = 0; i < nr; i++) {
>>> +		struct folio *folio = pfn_folio(range[i].start);
>>> +		unsigned int refs = range[i].end - range[i].start;
>>
>> Don't you need to check for huge zero page like in folios_put()?
>>
>> 		if (is_huge_zero_page(&folio->page))
>> 			continue;
> 
> Maybe?  Can we put the huge zero page in here, or would we filter it
> out earlier?

I'm not sure there is any earlier opportunity? The next earlist would be to
filter it out from the mmugather batch and that feels like confusing
responsibilities too much. So my vote is to put it here.


> 
>>> +	if (lruvec)
>>> +		unlock_page_lruvec_irqrestore(lruvec, flags);
>>> +
>>> +	if (folios.nr) {
>>> +		mem_cgroup_uncharge_folios(&folios);
>>> +		free_unref_folios(&folios);
>>> +	}
>>> +}
>>
>> This still duplicates a lot of the logic in folios_put(), but I see you have an
>> idea in the other thread for improving this situation - I'll reply in the
>> context of that thread.
>>
>> But overall this looks great - thanks for taking the time to put this together -
>> it will integrate nicely with my mmugather series.
>>
> 
> Thanks!  One thing I was wondering; intended to look at today, but didn't
> have time for it -- can we use mmugather to solve how vmscan wants to
> handle batched TLB IPIs for mapped pages, instead of having its own thing?
> 72b252aed506 is the commit to look at.

Yeah I thought about that too. mmugather works well for TLBIing a contiguous
range of VAs (it just maintains a range that gets expanded to the lowest and
highest entries). That works well for things like munmap and mmap_exit where a
virtually contigious block is being zapped. But I'm guessing that for vmscan,
the pages are usually not contiguous? If so, then I think mmugather would
over-TLBI and likely harm perf?

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a517ef8d2386..4f9e2cfb372e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1515,6 +1515,13 @@  typedef union {
 void release_pages(release_pages_arg, int nr);
 void folios_put(struct folio_batch *folios);
 
+struct pfn_range {
+	unsigned long start;
+	unsigned long end;
+};
+
+void pfn_range_put(struct pfn_range *range, unsigned int nr);
+
 static inline void put_page(struct page *page)
 {
 	struct folio *folio = page_folio(page);
diff --git a/mm/swap.c b/mm/swap.c
index 8bd15402cd8f..218d2cc4c6f4 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -1027,6 +1027,62 @@  void folios_put(struct folio_batch *folios)
 }
 EXPORT_SYMBOL(folios_put);
 
+void pfn_range_put(struct pfn_range *range, unsigned int nr)
+{
+	struct folio_batch folios;
+	unsigned int i;
+	struct lruvec *lruvec = NULL;
+	unsigned long flags = 0;
+
+	folio_batch_init(&folios);
+	for (i = 0; i < nr; i++) {
+		struct folio *folio = pfn_folio(range[i].start);
+		unsigned int refs = range[i].end - range[i].start;
+
+		if (folio_is_zone_device(folio)) {
+			if (lruvec) {
+				unlock_page_lruvec_irqrestore(lruvec, flags);
+				lruvec = NULL;
+			}
+			if (put_devmap_managed_page_refs(&folio->page, refs))
+				continue;
+			if (folio_ref_sub_and_test(folio, refs))
+				free_zone_device_page(&folio->page);
+			continue;
+		}
+
+		if (!folio_ref_sub_and_test(folio, refs))
+			continue;
+
+		/* hugetlb has its own memcg */
+		if (folio_test_hugetlb(folio)) {
+			if (lruvec) {
+				unlock_page_lruvec_irqrestore(lruvec, flags);
+				lruvec = NULL;
+			}
+			free_huge_folio(folio);
+			continue;
+		}
+
+		__page_cache_release(folio, &lruvec, &flags);
+		if (folio_batch_add(&folios, folio) == 0) {
+			if (lruvec) {
+				unlock_page_lruvec_irqrestore(lruvec, flags);
+				lruvec = NULL;
+			}
+			mem_cgroup_uncharge_folios(&folios);
+			free_unref_folios(&folios);
+		}
+	}
+	if (lruvec)
+		unlock_page_lruvec_irqrestore(lruvec, flags);
+
+	if (folios.nr) {
+		mem_cgroup_uncharge_folios(&folios);
+		free_unref_folios(&folios);
+	}
+}
+
 /**
  * release_pages - batched put_page()
  * @arg: array of pages to release

[RFC,18/18] mm: Add pfn_range_put()

Commit Message

Comments

Patch