[v1,2/3] mm: Implement folio_remove_rmap_range()

Message ID	20230717143110.260162-3-ryan.roberts@arm.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Ryan Roberts <ryan.roberts@arm.com> To: Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org>, Yin Fengwei <fengwei.yin@intel.com>, David Hildenbrand <david@redhat.com>, Yu Zhao <yuzhao@google.com>, Yang Shi <shy828301@gmail.com>, "Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com> Cc: Ryan Roberts <ryan.roberts@arm.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 2/3] mm: Implement folio_remove_rmap_range() Date: Mon, 17 Jul 2023 15:31:09 +0100 Message-Id: <20230717143110.260162-3-ryan.roberts@arm.com> In-Reply-To: <20230717143110.260162-1-ryan.roberts@arm.com> References: <20230717143110.260162-1-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Optimize large folio interaction with deferred split \| expand [v1,0/3] Optimize large folio interaction with deferred split [v1,1/3] mm: Allow deferred splitting of arbitrary large anon folios [v1,2/3] mm: Implement folio_remove_rmap_range() [v1,3/3] mm: Batch-zap large anonymous folio PTE mappings

Ryan Roberts July 17, 2023, 2:31 p.m. UTC

Like page_remove_rmap() but batch-removes the rmap for a range of pages
belonging to a folio. This can provide a small speedup due to less
manipuation of the various counters. But more crucially, if removing the
rmap for all pages of a folio in a batch, there is no need to
(spuriously) add it to the deferred split list, which saves significant
cost when there is contention for the split queue lock.

All contained pages are accounted using the order-0 folio (or base page)
scheme.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 include/linux/rmap.h |  2 ++
 mm/rmap.c            | 65 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+)

Matthew Wilcox July 17, 2023, 3:07 p.m. UTC | #1

On Mon, Jul 17, 2023 at 03:31:09PM +0100, Ryan Roberts wrote:
> +/*
> + * folio_remove_rmap_range - take down pte mappings from a range of pages
> + * belonging to a folio. All pages are accounted as small pages.
> + * @folio:	folio that all pages belong to
> + * @page:       first page in range to remove mapping from
> + * @nr:		number of pages in range to remove mapping from
> + * @vma:        the vm area from which the mapping is removed
> + *
> + * The caller needs to hold the pte lock.
> + */

This could stand a little reworking.  How about this?

/**
 * folio_remove_rmap_range - Take down PTE mappings from a range of pages.
 * @folio: Folio containing all pages in range.
 * @page: First page in range to unmap.
 * @nr: Number of pages to unmap.
 * @vma: The VM area containing the range.
 *
 * All pages in the range must belong to the same VMA & folio.  They
 * must be mapped with PTEs, not a PMD.
 *
 * Context: Caller holds the pte lock.
 */

> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
> +					int nr, struct vm_area_struct *vma)
> +{
> +	atomic_t *mapped = &folio->_nr_pages_mapped;
> +	int nr_unmapped = 0;
> +	int nr_mapped;
> +	bool last;
> +	enum node_stat_item idx;
> +
> +	if (unlikely(folio_test_hugetlb(folio))) {
> +		VM_WARN_ON_FOLIO(1, folio);
> +		return;
> +	}
> +
> +	if (!folio_test_large(folio)) {
> +		/* Is this the page's last map to be removed? */
> +		last = atomic_add_negative(-1, &page->_mapcount);
> +		nr_unmapped = last;
> +	} else {
> +		for (; nr != 0; nr--, page++) {
> +			/* Is this the page's last map to be removed? */
> +			last = atomic_add_negative(-1, &page->_mapcount);
> +			if (last) {
> +				/* Page still mapped if folio mapped entirely */
> +				nr_mapped = atomic_dec_return_relaxed(mapped);

We're still doing one atomic op per page on the folio's nr_pages_mapped
... is it possible to batch this and use atomic_sub_return_relaxed()?

Zi Yan July 17, 2023, 3:09 p.m. UTC | #2

On 17 Jul 2023, at 10:31, Ryan Roberts wrote:

> Like page_remove_rmap() but batch-removes the rmap for a range of pages
> belonging to a folio. This can provide a small speedup due to less
> manipuation of the various counters. But more crucially, if removing the
> rmap for all pages of a folio in a batch, there is no need to
> (spuriously) add it to the deferred split list, which saves significant
> cost when there is contention for the split queue lock.
>
> All contained pages are accounted using the order-0 folio (or base page)
> scheme.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  include/linux/rmap.h |  2 ++
>  mm/rmap.c            | 65 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 67 insertions(+)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index b87d01660412..f578975c12c0 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -200,6 +200,8 @@ void page_add_file_rmap(struct page *, struct vm_area_struct *,
>  		bool compound);
>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>  		bool compound);
> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
> +		int nr, struct vm_area_struct *vma);
>
>  void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *,
>  		unsigned long address, rmap_t flags);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 2baf57d65c23..1da05aca2bb1 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1359,6 +1359,71 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>  	mlock_vma_folio(folio, vma, compound);
>  }
>
> +/*
> + * folio_remove_rmap_range - take down pte mappings from a range of pages
> + * belonging to a folio. All pages are accounted as small pages.
> + * @folio:	folio that all pages belong to
> + * @page:       first page in range to remove mapping from
> + * @nr:		number of pages in range to remove mapping from

We might need some checks to make sure [page, page+nr] is in the range of
the folio. Something like:

page >= &folio->page && page + nr < (&folio->page + folio_nr_pages(folio))

> + * @vma:        the vm area from which the mapping is removed
> + *
> + * The caller needs to hold the pte lock.
> + */
> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
> +					int nr, struct vm_area_struct *vma)
> +{
> +	atomic_t *mapped = &folio->_nr_pages_mapped;
> +	int nr_unmapped = 0;
> +	int nr_mapped;
> +	bool last;
> +	enum node_stat_item idx;
> +
> +	if (unlikely(folio_test_hugetlb(folio))) {
> +		VM_WARN_ON_FOLIO(1, folio);
> +		return;
> +	}
> +
> +	if (!folio_test_large(folio)) {
> +		/* Is this the page's last map to be removed? */
> +		last = atomic_add_negative(-1, &page->_mapcount);
> +		nr_unmapped = last;
> +	} else {
> +		for (; nr != 0; nr--, page++) {
> +			/* Is this the page's last map to be removed? */
> +			last = atomic_add_negative(-1, &page->_mapcount);
> +			if (last) {
> +				/* Page still mapped if folio mapped entirely */
> +				nr_mapped = atomic_dec_return_relaxed(mapped);
> +				if (nr_mapped < COMPOUND_MAPPED)
> +					nr_unmapped++;
> +			}
> +		}
> +	}
> +
> +	if (nr_unmapped) {
> +		idx = folio_test_anon(folio) ? NR_ANON_MAPPED : NR_FILE_MAPPED;
> +		__lruvec_stat_mod_folio(folio, idx, -nr_unmapped);
> +
> +		/*
> +		 * Queue anon THP for deferred split if we have just unmapped at
> +		 * least 1 page, while at least 1 page remains mapped.
> +		 */
> +		if (folio_test_large(folio) && folio_test_anon(folio))
> +			if (nr_mapped)
> +				deferred_split_folio(folio);
> +	}
> +
> +	/*
> +	 * It would be tidy to reset folio_test_anon mapping when fully
> +	 * unmapped, but that might overwrite a racing page_add_anon_rmap
> +	 * which increments mapcount after us but sets mapping before us:
> +	 * so leave the reset to free_pages_prepare, and remember that
> +	 * it's only reliable while mapped.
> +	 */
> +
> +	munlock_vma_folio(folio, vma, false);
> +}
> +
>  /**
>   * page_remove_rmap - take down pte mapping from a page
>   * @page:	page to remove mapping from
> -- 
> 2.25.1

Everything else looks good to me. Reviewed-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi

Ryan Roberts July 17, 2023, 3:49 p.m. UTC | #3

On 17/07/2023 16:07, Matthew Wilcox wrote:
> On Mon, Jul 17, 2023 at 03:31:09PM +0100, Ryan Roberts wrote:
>> +/*
>> + * folio_remove_rmap_range - take down pte mappings from a range of pages
>> + * belonging to a folio. All pages are accounted as small pages.
>> + * @folio:	folio that all pages belong to
>> + * @page:       first page in range to remove mapping from
>> + * @nr:		number of pages in range to remove mapping from
>> + * @vma:        the vm area from which the mapping is removed
>> + *
>> + * The caller needs to hold the pte lock.
>> + */
> 
> This could stand a little reworking.  How about this?
> 
> /**
>  * folio_remove_rmap_range - Take down PTE mappings from a range of pages.
>  * @folio: Folio containing all pages in range.
>  * @page: First page in range to unmap.
>  * @nr: Number of pages to unmap.
>  * @vma: The VM area containing the range.
>  *
>  * All pages in the range must belong to the same VMA & folio.  They
>  * must be mapped with PTEs, not a PMD.
>  *
>  * Context: Caller holds the pte lock.
>  */

LGTM! thanks.

> 
>> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
>> +					int nr, struct vm_area_struct *vma)
>> +{
>> +	atomic_t *mapped = &folio->_nr_pages_mapped;
>> +	int nr_unmapped = 0;
>> +	int nr_mapped;
>> +	bool last;
>> +	enum node_stat_item idx;
>> +
>> +	if (unlikely(folio_test_hugetlb(folio))) {
>> +		VM_WARN_ON_FOLIO(1, folio);
>> +		return;
>> +	}
>> +
>> +	if (!folio_test_large(folio)) {
>> +		/* Is this the page's last map to be removed? */
>> +		last = atomic_add_negative(-1, &page->_mapcount);
>> +		nr_unmapped = last;
>> +	} else {
>> +		for (; nr != 0; nr--, page++) {
>> +			/* Is this the page's last map to be removed? */
>> +			last = atomic_add_negative(-1, &page->_mapcount);
>> +			if (last) {
>> +				/* Page still mapped if folio mapped entirely */
>> +				nr_mapped = atomic_dec_return_relaxed(mapped);
> 
> We're still doing one atomic op per page on the folio's nr_pages_mapped
> ... is it possible to batch this and use atomic_sub_return_relaxed()?

Good spot, something like this:

	} else {
		for (; nr != 0; nr--, page++) {
			/* Is this the page's last map to be removed? */
			last = atomic_add_negative(-1, &page->_mapcount);
			if (last)
				nr_unmapped++;
		}

		/* Pages still mapped if folio mapped entirely */
		nr_mapped = atomic_sub_return_relaxed(nr_unmapped, mapped);
		if (nr_mapped >= COMPOUND_MAPPED)
			nr_unmapped = 0;
	}

Ryan Roberts July 17, 2023, 3:51 p.m. UTC | #4

On 17/07/2023 16:09, Zi Yan wrote:
> On 17 Jul 2023, at 10:31, Ryan Roberts wrote:
> 
>> Like page_remove_rmap() but batch-removes the rmap for a range of pages
>> belonging to a folio. This can provide a small speedup due to less
>> manipuation of the various counters. But more crucially, if removing the
>> rmap for all pages of a folio in a batch, there is no need to
>> (spuriously) add it to the deferred split list, which saves significant
>> cost when there is contention for the split queue lock.
>>
>> All contained pages are accounted using the order-0 folio (or base page)
>> scheme.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  include/linux/rmap.h |  2 ++
>>  mm/rmap.c            | 65 ++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 67 insertions(+)
>>
>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>> index b87d01660412..f578975c12c0 100644
>> --- a/include/linux/rmap.h
>> +++ b/include/linux/rmap.h
>> @@ -200,6 +200,8 @@ void page_add_file_rmap(struct page *, struct vm_area_struct *,
>>  		bool compound);
>>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>>  		bool compound);
>> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
>> +		int nr, struct vm_area_struct *vma);
>>
>>  void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *,
>>  		unsigned long address, rmap_t flags);
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 2baf57d65c23..1da05aca2bb1 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1359,6 +1359,71 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>  	mlock_vma_folio(folio, vma, compound);
>>  }
>>
>> +/*
>> + * folio_remove_rmap_range - take down pte mappings from a range of pages
>> + * belonging to a folio. All pages are accounted as small pages.
>> + * @folio:	folio that all pages belong to
>> + * @page:       first page in range to remove mapping from
>> + * @nr:		number of pages in range to remove mapping from
> 
> We might need some checks to make sure [page, page+nr] is in the range of
> the folio. Something like:
> 
> page >= &folio->page && page + nr < (&folio->page + folio_nr_pages(folio))

No problem. Is a VM_WARN_ON() appropriate for something like this?

> 
>> + * @vma:        the vm area from which the mapping is removed
>> + *
>> + * The caller needs to hold the pte lock.
>> + */
>> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
>> +					int nr, struct vm_area_struct *vma)
>> +{
>> +	atomic_t *mapped = &folio->_nr_pages_mapped;
>> +	int nr_unmapped = 0;
>> +	int nr_mapped;
>> +	bool last;
>> +	enum node_stat_item idx;
>> +
>> +	if (unlikely(folio_test_hugetlb(folio))) {
>> +		VM_WARN_ON_FOLIO(1, folio);
>> +		return;
>> +	}
>> +
>> +	if (!folio_test_large(folio)) {
>> +		/* Is this the page's last map to be removed? */
>> +		last = atomic_add_negative(-1, &page->_mapcount);
>> +		nr_unmapped = last;
>> +	} else {
>> +		for (; nr != 0; nr--, page++) {
>> +			/* Is this the page's last map to be removed? */
>> +			last = atomic_add_negative(-1, &page->_mapcount);
>> +			if (last) {
>> +				/* Page still mapped if folio mapped entirely */
>> +				nr_mapped = atomic_dec_return_relaxed(mapped);
>> +				if (nr_mapped < COMPOUND_MAPPED)
>> +					nr_unmapped++;
>> +			}
>> +		}
>> +	}
>> +
>> +	if (nr_unmapped) {
>> +		idx = folio_test_anon(folio) ? NR_ANON_MAPPED : NR_FILE_MAPPED;
>> +		__lruvec_stat_mod_folio(folio, idx, -nr_unmapped);
>> +
>> +		/*
>> +		 * Queue anon THP for deferred split if we have just unmapped at
>> +		 * least 1 page, while at least 1 page remains mapped.
>> +		 */
>> +		if (folio_test_large(folio) && folio_test_anon(folio))
>> +			if (nr_mapped)
>> +				deferred_split_folio(folio);
>> +	}
>> +
>> +	/*
>> +	 * It would be tidy to reset folio_test_anon mapping when fully
>> +	 * unmapped, but that might overwrite a racing page_add_anon_rmap
>> +	 * which increments mapcount after us but sets mapping before us:
>> +	 * so leave the reset to free_pages_prepare, and remember that
>> +	 * it's only reliable while mapped.
>> +	 */
>> +
>> +	munlock_vma_folio(folio, vma, false);
>> +}
>> +
>>  /**
>>   * page_remove_rmap - take down pte mapping from a page
>>   * @page:	page to remove mapping from
>> -- 
>> 2.25.1
> 
> Everything else looks good to me. Reviewed-by: Zi Yan <ziy@nvidia.com>
> 
> --
> Best Regards,
> Yan, Zi

Zi Yan July 17, 2023, 3:53 p.m. UTC | #5

On 17 Jul 2023, at 11:51, Ryan Roberts wrote:

> On 17/07/2023 16:09, Zi Yan wrote:
>> On 17 Jul 2023, at 10:31, Ryan Roberts wrote:
>>
>>> Like page_remove_rmap() but batch-removes the rmap for a range of pages
>>> belonging to a folio. This can provide a small speedup due to less
>>> manipuation of the various counters. But more crucially, if removing the
>>> rmap for all pages of a folio in a batch, there is no need to
>>> (spuriously) add it to the deferred split list, which saves significant
>>> cost when there is contention for the split queue lock.
>>>
>>> All contained pages are accounted using the order-0 folio (or base page)
>>> scheme.
>>>
>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>> ---
>>>  include/linux/rmap.h |  2 ++
>>>  mm/rmap.c            | 65 ++++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 67 insertions(+)
>>>
>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>> index b87d01660412..f578975c12c0 100644
>>> --- a/include/linux/rmap.h
>>> +++ b/include/linux/rmap.h
>>> @@ -200,6 +200,8 @@ void page_add_file_rmap(struct page *, struct vm_area_struct *,
>>>  		bool compound);
>>>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>>>  		bool compound);
>>> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
>>> +		int nr, struct vm_area_struct *vma);
>>>
>>>  void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *,
>>>  		unsigned long address, rmap_t flags);
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 2baf57d65c23..1da05aca2bb1 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1359,6 +1359,71 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>>  	mlock_vma_folio(folio, vma, compound);
>>>  }
>>>
>>> +/*
>>> + * folio_remove_rmap_range - take down pte mappings from a range of pages
>>> + * belonging to a folio. All pages are accounted as small pages.
>>> + * @folio:	folio that all pages belong to
>>> + * @page:       first page in range to remove mapping from
>>> + * @nr:		number of pages in range to remove mapping from
>>
>> We might need some checks to make sure [page, page+nr] is in the range of
>> the folio. Something like:
>>
>> page >= &folio->page && page + nr < (&folio->page + folio_nr_pages(folio))
>
> No problem. Is a VM_WARN_ON() appropriate for something like this?

VM_WARN_ON_ONCE() might be better.

--
Best Regards,
Yan, Zi

Matthew Wilcox July 17, 2023, 3:56 p.m. UTC | #6

On Mon, Jul 17, 2023 at 04:49:19PM +0100, Ryan Roberts wrote:
> > We're still doing one atomic op per page on the folio's nr_pages_mapped
> > ... is it possible to batch this and use atomic_sub_return_relaxed()?
> 
> Good spot, something like this:
> 
> 	} else {
> 		for (; nr != 0; nr--, page++) {
> 			/* Is this the page's last map to be removed? */
> 			last = atomic_add_negative(-1, &page->_mapcount);
> 			if (last)
> 				nr_unmapped++;
> 		}
> 
> 		/* Pages still mapped if folio mapped entirely */
> 		nr_mapped = atomic_sub_return_relaxed(nr_unmapped, mapped);
> 		if (nr_mapped >= COMPOUND_MAPPED)
> 			nr_unmapped = 0;
> 	}

I think that's right, but my eyes always go slightly crossed trying to
read the new mapcount scheme.

Yin, Fengwei July 18, 2023, 1:14 a.m. UTC | #7

On 7/17/23 22:31, Ryan Roberts wrote:
> Like page_remove_rmap() but batch-removes the rmap for a range of pages
> belonging to a folio. This can provide a small speedup due to less
> manipuation of the various counters. But more crucially, if removing the
> rmap for all pages of a folio in a batch, there is no need to
> (spuriously) add it to the deferred split list, which saves significant
> cost when there is contention for the split queue lock.
> 
> All contained pages are accounted using the order-0 folio (or base page)
> scheme.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>

Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>

Regards
Yin, Fengwei

> ---
>  include/linux/rmap.h |  2 ++
>  mm/rmap.c            | 65 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 67 insertions(+)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index b87d01660412..f578975c12c0 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -200,6 +200,8 @@ void page_add_file_rmap(struct page *, struct vm_area_struct *,
>  		bool compound);
>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>  		bool compound);
> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
> +		int nr, struct vm_area_struct *vma);
>  
>  void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *,
>  		unsigned long address, rmap_t flags);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 2baf57d65c23..1da05aca2bb1 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1359,6 +1359,71 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>  	mlock_vma_folio(folio, vma, compound);
>  }
>  
> +/*
> + * folio_remove_rmap_range - take down pte mappings from a range of pages
> + * belonging to a folio. All pages are accounted as small pages.
> + * @folio:	folio that all pages belong to
> + * @page:       first page in range to remove mapping from
> + * @nr:		number of pages in range to remove mapping from
> + * @vma:        the vm area from which the mapping is removed
> + *
> + * The caller needs to hold the pte lock.
> + */
> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
> +					int nr, struct vm_area_struct *vma)
> +{
> +	atomic_t *mapped = &folio->_nr_pages_mapped;
> +	int nr_unmapped = 0;
> +	int nr_mapped;
> +	bool last;
> +	enum node_stat_item idx;
> +
> +	if (unlikely(folio_test_hugetlb(folio))) {
> +		VM_WARN_ON_FOLIO(1, folio);
> +		return;
> +	}
> +
> +	if (!folio_test_large(folio)) {
> +		/* Is this the page's last map to be removed? */
> +		last = atomic_add_negative(-1, &page->_mapcount);
> +		nr_unmapped = last;
> +	} else {
> +		for (; nr != 0; nr--, page++) {
> +			/* Is this the page's last map to be removed? */
> +			last = atomic_add_negative(-1, &page->_mapcount);
> +			if (last) {
> +				/* Page still mapped if folio mapped entirely */
> +				nr_mapped = atomic_dec_return_relaxed(mapped);
> +				if (nr_mapped < COMPOUND_MAPPED)
> +					nr_unmapped++;
> +			}
> +		}
> +	}
> +
> +	if (nr_unmapped) {
> +		idx = folio_test_anon(folio) ? NR_ANON_MAPPED : NR_FILE_MAPPED;
> +		__lruvec_stat_mod_folio(folio, idx, -nr_unmapped);
> +
> +		/*
> +		 * Queue anon THP for deferred split if we have just unmapped at
> +		 * least 1 page, while at least 1 page remains mapped.
> +		 */
> +		if (folio_test_large(folio) && folio_test_anon(folio))
> +			if (nr_mapped)
> +				deferred_split_folio(folio);
> +	}
> +
> +	/*
> +	 * It would be tidy to reset folio_test_anon mapping when fully
> +	 * unmapped, but that might overwrite a racing page_add_anon_rmap
> +	 * which increments mapcount after us but sets mapping before us:
> +	 * so leave the reset to free_pages_prepare, and remember that
> +	 * it's only reliable while mapped.
> +	 */
> +
> +	munlock_vma_folio(folio, vma, false);
> +}
> +
>  /**
>   * page_remove_rmap - take down pte mapping from a page
>   * @page:	page to remove mapping from

Huang Ying July 18, 2023, 6:22 a.m. UTC | #8

Ryan Roberts <ryan.roberts@arm.com> writes:

> Like page_remove_rmap() but batch-removes the rmap for a range of pages
> belonging to a folio. This can provide a small speedup due to less
> manipuation of the various counters. But more crucially, if removing the
> rmap for all pages of a folio in a batch, there is no need to
> (spuriously) add it to the deferred split list, which saves significant
> cost when there is contention for the split queue lock.
>
> All contained pages are accounted using the order-0 folio (or base page)
> scheme.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  include/linux/rmap.h |  2 ++
>  mm/rmap.c            | 65 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 67 insertions(+)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index b87d01660412..f578975c12c0 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -200,6 +200,8 @@ void page_add_file_rmap(struct page *, struct vm_area_struct *,
>  		bool compound);
>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>  		bool compound);
> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
> +		int nr, struct vm_area_struct *vma);
>  
>  void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *,
>  		unsigned long address, rmap_t flags);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 2baf57d65c23..1da05aca2bb1 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1359,6 +1359,71 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>  	mlock_vma_folio(folio, vma, compound);
>  }
>  
> +/*
> + * folio_remove_rmap_range - take down pte mappings from a range of pages
> + * belonging to a folio. All pages are accounted as small pages.
> + * @folio:	folio that all pages belong to
> + * @page:       first page in range to remove mapping from
> + * @nr:		number of pages in range to remove mapping from
> + * @vma:        the vm area from which the mapping is removed
> + *
> + * The caller needs to hold the pte lock.
> + */
> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
> +					int nr, struct vm_area_struct *vma)
> +{
> +	atomic_t *mapped = &folio->_nr_pages_mapped;
> +	int nr_unmapped = 0;
> +	int nr_mapped;
> +	bool last;
> +	enum node_stat_item idx;
> +
> +	if (unlikely(folio_test_hugetlb(folio))) {
> +		VM_WARN_ON_FOLIO(1, folio);
> +		return;
> +	}
> +
> +	if (!folio_test_large(folio)) {
> +		/* Is this the page's last map to be removed? */
> +		last = atomic_add_negative(-1, &page->_mapcount);
> +		nr_unmapped = last;
> +	} else {
> +		for (; nr != 0; nr--, page++) {
> +			/* Is this the page's last map to be removed? */
> +			last = atomic_add_negative(-1, &page->_mapcount);
> +			if (last) {
> +				/* Page still mapped if folio mapped entirely */
> +				nr_mapped = atomic_dec_return_relaxed(mapped);
> +				if (nr_mapped < COMPOUND_MAPPED)
> +					nr_unmapped++;
> +			}
> +		}
> +	}
> +
> +	if (nr_unmapped) {
> +		idx = folio_test_anon(folio) ? NR_ANON_MAPPED : NR_FILE_MAPPED;
> +		__lruvec_stat_mod_folio(folio, idx, -nr_unmapped);
> +
> +		/*
> +		 * Queue anon THP for deferred split if we have just unmapped at

Just some nitpicks.  So feel free to ignore.

s/anon THP/large folio/ ?

> +		 * least 1 page, while at least 1 page remains mapped.
> +		 */
> +		if (folio_test_large(folio) && folio_test_anon(folio))
> +			if (nr_mapped)

                if (folio_test_large(folio) && folio_test_anon(folio) && nr_mapped) ?

> +				deferred_split_folio(folio);
> +	}
> +
> +	/*
> +	 * It would be tidy to reset folio_test_anon mapping when fully
> +	 * unmapped, but that might overwrite a racing page_add_anon_rmap
> +	 * which increments mapcount after us but sets mapping before us:
> +	 * so leave the reset to free_pages_prepare, and remember that
> +	 * it's only reliable while mapped.
> +	 */
> +
> +	munlock_vma_folio(folio, vma, false);
> +}
> +
>  /**
>   * page_remove_rmap - take down pte mapping from a page
>   * @page:	page to remove mapping from

Best Regards,
Huang, Ying

Huang Ying July 18, 2023, 7:12 a.m. UTC | #9

Ryan Roberts <ryan.roberts@arm.com> writes:

> Like page_remove_rmap() but batch-removes the rmap for a range of pages
> belonging to a folio. This can provide a small speedup due to less
> manipuation of the various counters. But more crucially, if removing the
> rmap for all pages of a folio in a batch, there is no need to
> (spuriously) add it to the deferred split list, which saves significant
> cost when there is contention for the split queue lock.
>
> All contained pages are accounted using the order-0 folio (or base page)
> scheme.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  include/linux/rmap.h |  2 ++
>  mm/rmap.c            | 65 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 67 insertions(+)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index b87d01660412..f578975c12c0 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -200,6 +200,8 @@ void page_add_file_rmap(struct page *, struct vm_area_struct *,
>  		bool compound);
>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>  		bool compound);
> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
> +		int nr, struct vm_area_struct *vma);
>  
>  void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *,
>  		unsigned long address, rmap_t flags);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 2baf57d65c23..1da05aca2bb1 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1359,6 +1359,71 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>  	mlock_vma_folio(folio, vma, compound);
>  }
>  
> +/*
> + * folio_remove_rmap_range - take down pte mappings from a range of pages
> + * belonging to a folio. All pages are accounted as small pages.
> + * @folio:	folio that all pages belong to
> + * @page:       first page in range to remove mapping from
> + * @nr:		number of pages in range to remove mapping from
> + * @vma:        the vm area from which the mapping is removed
> + *
> + * The caller needs to hold the pte lock.
> + */
> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
> +					int nr, struct vm_area_struct *vma)

Can we call folio_remove_ramp_range() in page_remove_rmap() if
!compound?  This can give us some opportunities to reduce code
duplication?

Best Regards,
Huang, Ying

> +{
> +	atomic_t *mapped = &folio->_nr_pages_mapped;
> +	int nr_unmapped = 0;
> +	int nr_mapped;
> +	bool last;
> +	enum node_stat_item idx;
> +
> +	if (unlikely(folio_test_hugetlb(folio))) {
> +		VM_WARN_ON_FOLIO(1, folio);
> +		return;
> +	}
> +
> +	if (!folio_test_large(folio)) {
> +		/* Is this the page's last map to be removed? */
> +		last = atomic_add_negative(-1, &page->_mapcount);
> +		nr_unmapped = last;
> +	} else {
> +		for (; nr != 0; nr--, page++) {
> +			/* Is this the page's last map to be removed? */
> +			last = atomic_add_negative(-1, &page->_mapcount);
> +			if (last) {
> +				/* Page still mapped if folio mapped entirely */
> +				nr_mapped = atomic_dec_return_relaxed(mapped);
> +				if (nr_mapped < COMPOUND_MAPPED)
> +					nr_unmapped++;
> +			}
> +		}
> +	}
> +
> +	if (nr_unmapped) {
> +		idx = folio_test_anon(folio) ? NR_ANON_MAPPED : NR_FILE_MAPPED;
> +		__lruvec_stat_mod_folio(folio, idx, -nr_unmapped);
> +
> +		/*
> +		 * Queue anon THP for deferred split if we have just unmapped at
> +		 * least 1 page, while at least 1 page remains mapped.
> +		 */
> +		if (folio_test_large(folio) && folio_test_anon(folio))
> +			if (nr_mapped)
> +				deferred_split_folio(folio);
> +	}
> +
> +	/*
> +	 * It would be tidy to reset folio_test_anon mapping when fully
> +	 * unmapped, but that might overwrite a racing page_add_anon_rmap
> +	 * which increments mapcount after us but sets mapping before us:
> +	 * so leave the reset to free_pages_prepare, and remember that
> +	 * it's only reliable while mapped.
> +	 */
> +
> +	munlock_vma_folio(folio, vma, false);
> +}
> +
>  /**
>   * page_remove_rmap - take down pte mapping from a page
>   * @page:	page to remove mapping from

Ryan Roberts July 18, 2023, 9:51 a.m. UTC | #10

On 18/07/2023 07:22, Huang, Ying wrote:
> Ryan Roberts <ryan.roberts@arm.com> writes:
> 
>> Like page_remove_rmap() but batch-removes the rmap for a range of pages
>> belonging to a folio. This can provide a small speedup due to less
>> manipuation of the various counters. But more crucially, if removing the
>> rmap for all pages of a folio in a batch, there is no need to
>> (spuriously) add it to the deferred split list, which saves significant
>> cost when there is contention for the split queue lock.
>>
>> All contained pages are accounted using the order-0 folio (or base page)
>> scheme.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  include/linux/rmap.h |  2 ++
>>  mm/rmap.c            | 65 ++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 67 insertions(+)
>>
>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>> index b87d01660412..f578975c12c0 100644
>> --- a/include/linux/rmap.h
>> +++ b/include/linux/rmap.h
>> @@ -200,6 +200,8 @@ void page_add_file_rmap(struct page *, struct vm_area_struct *,
>>  		bool compound);
>>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>>  		bool compound);
>> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
>> +		int nr, struct vm_area_struct *vma);
>>  
>>  void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *,
>>  		unsigned long address, rmap_t flags);
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 2baf57d65c23..1da05aca2bb1 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1359,6 +1359,71 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>  	mlock_vma_folio(folio, vma, compound);
>>  }
>>  
>> +/*
>> + * folio_remove_rmap_range - take down pte mappings from a range of pages
>> + * belonging to a folio. All pages are accounted as small pages.
>> + * @folio:	folio that all pages belong to
>> + * @page:       first page in range to remove mapping from
>> + * @nr:		number of pages in range to remove mapping from
>> + * @vma:        the vm area from which the mapping is removed
>> + *
>> + * The caller needs to hold the pte lock.
>> + */
>> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
>> +					int nr, struct vm_area_struct *vma)
>> +{
>> +	atomic_t *mapped = &folio->_nr_pages_mapped;
>> +	int nr_unmapped = 0;
>> +	int nr_mapped;
>> +	bool last;
>> +	enum node_stat_item idx;
>> +
>> +	if (unlikely(folio_test_hugetlb(folio))) {
>> +		VM_WARN_ON_FOLIO(1, folio);
>> +		return;
>> +	}
>> +
>> +	if (!folio_test_large(folio)) {
>> +		/* Is this the page's last map to be removed? */
>> +		last = atomic_add_negative(-1, &page->_mapcount);
>> +		nr_unmapped = last;
>> +	} else {
>> +		for (; nr != 0; nr--, page++) {
>> +			/* Is this the page's last map to be removed? */
>> +			last = atomic_add_negative(-1, &page->_mapcount);
>> +			if (last) {
>> +				/* Page still mapped if folio mapped entirely */
>> +				nr_mapped = atomic_dec_return_relaxed(mapped);
>> +				if (nr_mapped < COMPOUND_MAPPED)
>> +					nr_unmapped++;
>> +			}
>> +		}
>> +	}
>> +
>> +	if (nr_unmapped) {
>> +		idx = folio_test_anon(folio) ? NR_ANON_MAPPED : NR_FILE_MAPPED;
>> +		__lruvec_stat_mod_folio(folio, idx, -nr_unmapped);
>> +
>> +		/*
>> +		 * Queue anon THP for deferred split if we have just unmapped at
> 
> Just some nitpicks.  So feel free to ignore.
> 
> s/anon THP/large folio/ ?

ACK

> 
>> +		 * least 1 page, while at least 1 page remains mapped.
>> +		 */
>> +		if (folio_test_large(folio) && folio_test_anon(folio))
>> +			if (nr_mapped)
> 
>                 if (folio_test_large(folio) && folio_test_anon(folio) && nr_mapped) ?

ACK : I'll make these changes for the next version.

> 
>> +				deferred_split_folio(folio);
>> +	}
>> +
>> +	/*
>> +	 * It would be tidy to reset folio_test_anon mapping when fully
>> +	 * unmapped, but that might overwrite a racing page_add_anon_rmap
>> +	 * which increments mapcount after us but sets mapping before us:
>> +	 * so leave the reset to free_pages_prepare, and remember that
>> +	 * it's only reliable while mapped.
>> +	 */
>> +
>> +	munlock_vma_folio(folio, vma, false);
>> +}
>> +
>>  /**
>>   * page_remove_rmap - take down pte mapping from a page
>>   * @page:	page to remove mapping from
> 
> Best Regards,
> Huang, Ying

Ryan Roberts July 18, 2023, 10:02 a.m. UTC | #11

On 18/07/2023 08:12, Huang, Ying wrote:
> Ryan Roberts <ryan.roberts@arm.com> writes:
> 
>> Like page_remove_rmap() but batch-removes the rmap for a range of pages
>> belonging to a folio. This can provide a small speedup due to less
>> manipuation of the various counters. But more crucially, if removing the
>> rmap for all pages of a folio in a batch, there is no need to
>> (spuriously) add it to the deferred split list, which saves significant
>> cost when there is contention for the split queue lock.
>>
>> All contained pages are accounted using the order-0 folio (or base page)
>> scheme.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  include/linux/rmap.h |  2 ++
>>  mm/rmap.c            | 65 ++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 67 insertions(+)
>>
>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>> index b87d01660412..f578975c12c0 100644
>> --- a/include/linux/rmap.h
>> +++ b/include/linux/rmap.h
>> @@ -200,6 +200,8 @@ void page_add_file_rmap(struct page *, struct vm_area_struct *,
>>  		bool compound);
>>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>>  		bool compound);
>> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
>> +		int nr, struct vm_area_struct *vma);
>>  
>>  void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *,
>>  		unsigned long address, rmap_t flags);
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 2baf57d65c23..1da05aca2bb1 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1359,6 +1359,71 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>  	mlock_vma_folio(folio, vma, compound);
>>  }
>>  
>> +/*
>> + * folio_remove_rmap_range - take down pte mappings from a range of pages
>> + * belonging to a folio. All pages are accounted as small pages.
>> + * @folio:	folio that all pages belong to
>> + * @page:       first page in range to remove mapping from
>> + * @nr:		number of pages in range to remove mapping from
>> + * @vma:        the vm area from which the mapping is removed
>> + *
>> + * The caller needs to hold the pte lock.
>> + */
>> +void folio_remove_rmap_range(struct folio *folio, struct page *page,
>> +					int nr, struct vm_area_struct *vma)
> 
> Can we call folio_remove_ramp_range() in page_remove_rmap() if
> !compound?  This can give us some opportunities to reduce code
> duplication?

I considered that, but if felt like the savings were pretty small so my opinion
was that it was cleaner not to do this. This is the best I came up with. Perhaps
you can see further improvements?

void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
		bool compound)
{
	struct folio *folio = page_folio(page);
	atomic_t *mapped = &folio->_nr_pages_mapped;
	int nr = 0, nr_pmdmapped = 0;
	bool last;
	enum node_stat_item idx;

	VM_BUG_ON_PAGE(compound && !PageHead(page), page);

	/* Hugetlb pages are not counted in NR_*MAPPED */
	if (unlikely(folio_test_hugetlb(folio))) {
		/* hugetlb pages are always mapped with pmds */
		atomic_dec(&folio->_entire_mapcount);
		return;
	}

	/* Is page being unmapped by PTE? Is this its last map to be removed? */
	if (likely(!compound)) {
		folio_remove_rmap_range(folio, page, 1, vma);
		return;
	} else if (folio_test_pmd_mappable(folio)) {
		/* That test is redundant: it's for safety or to optimize out */

		last = atomic_add_negative(-1, &folio->_entire_mapcount);
		if (last) {
			nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped);
			if (likely(nr < COMPOUND_MAPPED)) {
				nr_pmdmapped = folio_nr_pages(folio);
				nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
				/* Raced ahead of another remove and an add? */
				if (unlikely(nr < 0))
					nr = 0;
			} else {
				/* An add of COMPOUND_MAPPED raced ahead */
				nr = 0;
			}
		}
	}

	if (nr_pmdmapped) {
		if (folio_test_anon(folio))
			idx = NR_ANON_THPS;
		else if (folio_test_swapbacked(folio))
			idx = NR_SHMEM_PMDMAPPED;
		else
			idx = NR_FILE_PMDMAPPED;
		__lruvec_stat_mod_folio(folio, idx, -nr_pmdmapped);
	}
	if (nr) {
		idx = folio_test_anon(folio) ? NR_ANON_MAPPED : NR_FILE_MAPPED;
		__lruvec_stat_mod_folio(folio, idx, -nr);

		/*
		 * Queue anon THP for deferred split if at least one
		 * page of the folio is unmapped and at least one page
		 * is still mapped.
		 */
		if (folio_test_anon(folio) && nr < nr_pmdmapped)
			deferred_split_folio(folio);
	}

	/*
	 * It would be tidy to reset folio_test_anon mapping when fully
	 * unmapped, but that might overwrite a racing page_add_anon_rmap
	 * which increments mapcount after us but sets mapping before us:
	 * so leave the reset to free_pages_prepare, and remember that
	 * it's only reliable while mapped.
	 */

	munlock_vma_folio(folio, vma, compound);
}

> 
> Best Regards,
> Huang, Ying
>

[v1,2/3] mm: Implement folio_remove_rmap_range()

Commit Message

Comments

Patch