[v1,00/11] mm: replace follow_page() by folio_walk

Message ID	20240802155524.517137-1-david@redhat.com (mailing list archive)
Headers	show Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBFAB175D3A for <linux-fsdevel@vger.kernel.org>; Fri, 2 Aug 2024 15:55:39 +0000 (UTC) From: David Hildenbrand <david@redhat.com> To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand <david@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Jonathan Corbet <corbet@lwn.net>, Christian Borntraeger <borntraeger@linux.ibm.com>, Janosch Frank <frankja@linux.ibm.com>, Claudio Imbrenda <imbrenda@linux.ibm.com>, Heiko Carstens <hca@linux.ibm.com>, Vasily Gorbik <gor@linux.ibm.com>, Alexander Gordeev <agordeev@linux.ibm.com>, Sven Schnelle <svens@linux.ibm.com>, Gerald Schaefer <gerald.schaefer@linux.ibm.com> Subject: [PATCH v1 00/11] mm: replace follow_page() by folio_walk Date: Fri, 2 Aug 2024 17:55:13 +0200 Message-ID: <20240802155524.517137-1-david@redhat.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	mm: replace follow_page() by folio_walk \| expand [v1,00/11] mm: replace follow_page() by folio_walk [v1,01/11] mm: provide vm_normal_(page\|folio)_pmd() with CONFIG_PGTABLE_HAS_HUGE_LEAVES [v1,02/11] mm/pagewalk: introduce folio_walk_start() + folio_walk_end() [v1,03/11] mm/migrate: convert do_pages_stat_array() from follow_page() to folio_walk [v1,04/11] mm/migrate: convert add_page_for_migration() from follow_page() to folio_walk [v1,05/11] mm/ksm: convert get_mergeable_page() from follow_page() to folio_walk [v1,06/11] mm/ksm: convert scan_get_next_rmap_item() from follow_page() to folio_walk [v1,07/11] mm/huge_memory: convert split_huge_pages_pid() from follow_page() to folio_walk [v1,08/11] s390/uv: convert gmap_destroy_page() from follow_page() to folio_walk [v1,09/11] s390/mm/fault: convert do_secure_storage_access() from follow_page() to folio_walk [v1,10/11] mm: remove follow_page() [v1,11/11] mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk

Message ID

20240802155524.517137-1-david@redhat.com (mailing list archive)

Headers

From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org,
	linux-doc@vger.kernel.org,
	kvm@vger.kernel.org,
	linux-s390@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Subject: [PATCH v1 00/11] mm: replace follow_page() by folio_walk
Date: Fri,  2 Aug 2024 17:55:13 +0200
Message-ID: <20240802155524.517137-1-david@redhat.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

mm: replace follow_page() by folio_walk | expand

Message

David Hildenbrand Aug. 2, 2024, 3:55 p.m. UTC

Looking into a way of moving the last folio_likely_mapped_shared() call
in add_folio_for_migration() under the PTL, I found myself removing
follow_page(). This paves the way for cleaning up all the FOLL_, follow_*
terminology to just be called "GUP" nowadays.

The new page table walker will lookup a mapped folio and return to the
caller with the PTL held, such that the folio cannot get unmapped
concurrently. Callers can then conditionally decide whether they really
want to take a short-term folio reference or whether the can simply
unlock the PTL and be done with it.

folio_walk is similar to page_vma_mapped_walk(), except that we don't know
the folio we want to walk to and that we are only walking to exactly one
PTE/PMD/PUD.

folio_walk provides access to the pte/pmd/pud (and the referenced folio
page because things like KSM need that), however, as part of this series
no page table modifications are performed by users.

We might be able to convert some other walk_page_range() users that really
only walk to one address, such as DAMON with
damon_mkold_ops/damon_young_ops. It might make sense to extend folio_walk
in the future to optionally fault in a folio (if applicable), such that we
can replace some get_user_pages() users that really only want to lookup
a single page/folio under PTL without unconditionally grabbing a folio
reference.

I have plans to extend the approach to a range walker that will try
batching various page table entries (not just folio pages) to be a better
replace for walk_page_range() -- and users will be able to opt in which
type of page table entries they want to process -- but that will require
more work and more thoughts.

KSM seems to work just fine (ksm_functional_tests selftests) and
move_pages seems to work (migration selftest). I tested the leaf
implementation excessively using various hugetlb sizes (64K, 2M, 32M, 1G)
on arm64 using move_pages and did some more testing on x86-64. Cross
compiled on a bunch of architectures.

I am not able to test the s390x Secure Execution changes, unfortunately.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>

David Hildenbrand (11):
  mm: provide vm_normal_(page|folio)_pmd() with
    CONFIG_PGTABLE_HAS_HUGE_LEAVES
  mm/pagewalk: introduce folio_walk_start() + folio_walk_end()
  mm/migrate: convert do_pages_stat_array() from follow_page() to
    folio_walk
  mm/migrate: convert add_page_for_migration() from follow_page() to
    folio_walk
  mm/ksm: convert get_mergeable_page() from follow_page() to folio_walk
  mm/ksm: convert scan_get_next_rmap_item() from follow_page() to
    folio_walk
  mm/huge_memory: convert split_huge_pages_pid() from follow_page() to
    folio_walk
  s390/uv: convert gmap_destroy_page() from follow_page() to folio_walk
  s390/mm/fault: convert do_secure_storage_access() from follow_page()
    to folio_walk
  mm: remove follow_page()
  mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk

 Documentation/mm/transhuge.rst |   6 +-
 arch/s390/kernel/uv.c          |  18 ++-
 arch/s390/mm/fault.c           |  16 ++-
 include/linux/mm.h             |   3 -
 include/linux/pagewalk.h       |  58 ++++++++++
 mm/filemap.c                   |   2 +-
 mm/gup.c                       |  24 +---
 mm/huge_memory.c               |  18 +--
 mm/ksm.c                       | 127 +++++++++------------
 mm/memory.c                    |   2 +-
 mm/migrate.c                   | 131 ++++++++++-----------
 mm/nommu.c                     |   6 -
 mm/pagewalk.c                  | 202 +++++++++++++++++++++++++++++++++
 13 files changed, 413 insertions(+), 200 deletions(-)

Comments

Andrew Morton Aug. 3, 2024, 5:34 a.m. UTC | #1

On Fri,  2 Aug 2024 17:55:13 +0200 David Hildenbrand <david@redhat.com> wrote:

> I am not able to test the s390x Secure Execution changes, unfortunately.

I'll add it to -next.  Could the s390 developers please check this?

Claudio Imbrenda Aug. 6, 2024, 1:42 p.m. UTC | #2

On Fri,  2 Aug 2024 17:55:13 +0200
David Hildenbrand <david@redhat.com> wrote:

> Looking into a way of moving the last folio_likely_mapped_shared() call
> in add_folio_for_migration() under the PTL, I found myself removing
> follow_page(). This paves the way for cleaning up all the FOLL_, follow_*
> terminology to just be called "GUP" nowadays.
> 
> The new page table walker will lookup a mapped folio and return to the
> caller with the PTL held, such that the folio cannot get unmapped
> concurrently. Callers can then conditionally decide whether they really
> want to take a short-term folio reference or whether the can simply
> unlock the PTL and be done with it.
> 
> folio_walk is similar to page_vma_mapped_walk(), except that we don't know
> the folio we want to walk to and that we are only walking to exactly one
> PTE/PMD/PUD.
> 
> folio_walk provides access to the pte/pmd/pud (and the referenced folio
> page because things like KSM need that), however, as part of this series
> no page table modifications are performed by users.
> 
> We might be able to convert some other walk_page_range() users that really
> only walk to one address, such as DAMON with
> damon_mkold_ops/damon_young_ops. It might make sense to extend folio_walk
> in the future to optionally fault in a folio (if applicable), such that we
> can replace some get_user_pages() users that really only want to lookup
> a single page/folio under PTL without unconditionally grabbing a folio
> reference.
> 
> I have plans to extend the approach to a range walker that will try
> batching various page table entries (not just folio pages) to be a better
> replace for walk_page_range() -- and users will be able to opt in which
> type of page table entries they want to process -- but that will require
> more work and more thoughts.
> 
> KSM seems to work just fine (ksm_functional_tests selftests) and
> move_pages seems to work (migration selftest). I tested the leaf
> implementation excessively using various hugetlb sizes (64K, 2M, 32M, 1G)
> on arm64 using move_pages and did some more testing on x86-64. Cross
> compiled on a bunch of architectures.
> 
> I am not able to test the s390x Secure Execution changes, unfortunately.

the series looks good; we will do some tests and report back if
everything is ok

> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Janosch Frank <frankja@linux.ibm.com>
> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> 
> David Hildenbrand (11):
>   mm: provide vm_normal_(page|folio)_pmd() with
>     CONFIG_PGTABLE_HAS_HUGE_LEAVES
>   mm/pagewalk: introduce folio_walk_start() + folio_walk_end()
>   mm/migrate: convert do_pages_stat_array() from follow_page() to
>     folio_walk
>   mm/migrate: convert add_page_for_migration() from follow_page() to
>     folio_walk
>   mm/ksm: convert get_mergeable_page() from follow_page() to folio_walk
>   mm/ksm: convert scan_get_next_rmap_item() from follow_page() to
>     folio_walk
>   mm/huge_memory: convert split_huge_pages_pid() from follow_page() to
>     folio_walk
>   s390/uv: convert gmap_destroy_page() from follow_page() to folio_walk
>   s390/mm/fault: convert do_secure_storage_access() from follow_page()
>     to folio_walk
>   mm: remove follow_page()
>   mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk
> 
>  Documentation/mm/transhuge.rst |   6 +-
>  arch/s390/kernel/uv.c          |  18 ++-
>  arch/s390/mm/fault.c           |  16 ++-
>  include/linux/mm.h             |   3 -
>  include/linux/pagewalk.h       |  58 ++++++++++
>  mm/filemap.c                   |   2 +-
>  mm/gup.c                       |  24 +---
>  mm/huge_memory.c               |  18 +--
>  mm/ksm.c                       | 127 +++++++++------------
>  mm/memory.c                    |   2 +-
>  mm/migrate.c                   | 131 ++++++++++-----------
>  mm/nommu.c                     |   6 -
>  mm/pagewalk.c                  | 202 +++++++++++++++++++++++++++++++++
>  13 files changed, 413 insertions(+), 200 deletions(-)
>

Claudio Imbrenda Aug. 7, 2024, 9:15 a.m. UTC | #3

On Fri,  2 Aug 2024 17:55:13 +0200
David Hildenbrand <david@redhat.com> wrote:

> Looking into a way of moving the last folio_likely_mapped_shared() call
> in add_folio_for_migration() under the PTL, I found myself removing
> follow_page(). This paves the way for cleaning up all the FOLL_, follow_*
> terminology to just be called "GUP" nowadays.
> 
> The new page table walker will lookup a mapped folio and return to the
> caller with the PTL held, such that the folio cannot get unmapped
> concurrently. Callers can then conditionally decide whether they really
> want to take a short-term folio reference or whether the can simply
> unlock the PTL and be done with it.
> 
> folio_walk is similar to page_vma_mapped_walk(), except that we don't know
> the folio we want to walk to and that we are only walking to exactly one
> PTE/PMD/PUD.
> 
> folio_walk provides access to the pte/pmd/pud (and the referenced folio
> page because things like KSM need that), however, as part of this series
> no page table modifications are performed by users.
> 
> We might be able to convert some other walk_page_range() users that really
> only walk to one address, such as DAMON with
> damon_mkold_ops/damon_young_ops. It might make sense to extend folio_walk
> in the future to optionally fault in a folio (if applicable), such that we
> can replace some get_user_pages() users that really only want to lookup
> a single page/folio under PTL without unconditionally grabbing a folio
> reference.
> 
> I have plans to extend the approach to a range walker that will try
> batching various page table entries (not just folio pages) to be a better
> replace for walk_page_range() -- and users will be able to opt in which
> type of page table entries they want to process -- but that will require
> more work and more thoughts.
> 
> KSM seems to work just fine (ksm_functional_tests selftests) and
> move_pages seems to work (migration selftest). I tested the leaf
> implementation excessively using various hugetlb sizes (64K, 2M, 32M, 1G)
> on arm64 using move_pages and did some more testing on x86-64. Cross
> compiled on a bunch of architectures.
> 
> I am not able to test the s390x Secure Execution changes, unfortunately.

The whole series looks good to me, but I do not feel confident enough
about all the folio details to actually r-b any of the non-s390
patches. (I do have a few questions, though)

As for the s390 patches: they look fine. I have tested the series on
s390 and nothing caught fire.

We will be able to get more CI coverage once this lands in -next.

> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Janosch Frank <frankja@linux.ibm.com>
> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> 
> David Hildenbrand (11):
>   mm: provide vm_normal_(page|folio)_pmd() with
>     CONFIG_PGTABLE_HAS_HUGE_LEAVES
>   mm/pagewalk: introduce folio_walk_start() + folio_walk_end()
>   mm/migrate: convert do_pages_stat_array() from follow_page() to
>     folio_walk
>   mm/migrate: convert add_page_for_migration() from follow_page() to
>     folio_walk
>   mm/ksm: convert get_mergeable_page() from follow_page() to folio_walk
>   mm/ksm: convert scan_get_next_rmap_item() from follow_page() to
>     folio_walk
>   mm/huge_memory: convert split_huge_pages_pid() from follow_page() to
>     folio_walk
>   s390/uv: convert gmap_destroy_page() from follow_page() to folio_walk
>   s390/mm/fault: convert do_secure_storage_access() from follow_page()
>     to folio_walk
>   mm: remove follow_page()
>   mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk
> 
>  Documentation/mm/transhuge.rst |   6 +-
>  arch/s390/kernel/uv.c          |  18 ++-
>  arch/s390/mm/fault.c           |  16 ++-
>  include/linux/mm.h             |   3 -
>  include/linux/pagewalk.h       |  58 ++++++++++
>  mm/filemap.c                   |   2 +-
>  mm/gup.c                       |  24 +---
>  mm/huge_memory.c               |  18 +--
>  mm/ksm.c                       | 127 +++++++++------------
>  mm/memory.c                    |   2 +-
>  mm/migrate.c                   | 131 ++++++++++-----------
>  mm/nommu.c                     |   6 -
>  mm/pagewalk.c                  | 202 +++++++++++++++++++++++++++++++++
>  13 files changed, 413 insertions(+), 200 deletions(-)
>

David Hildenbrand Aug. 7, 2024, 9:33 a.m. UTC | #4

On 07.08.24 11:15, Claudio Imbrenda wrote:
> On Fri,  2 Aug 2024 17:55:13 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Looking into a way of moving the last folio_likely_mapped_shared() call
>> in add_folio_for_migration() under the PTL, I found myself removing
>> follow_page(). This paves the way for cleaning up all the FOLL_, follow_*
>> terminology to just be called "GUP" nowadays.
>>
>> The new page table walker will lookup a mapped folio and return to the
>> caller with the PTL held, such that the folio cannot get unmapped
>> concurrently. Callers can then conditionally decide whether they really
>> want to take a short-term folio reference or whether the can simply
>> unlock the PTL and be done with it.
>>
>> folio_walk is similar to page_vma_mapped_walk(), except that we don't know
>> the folio we want to walk to and that we are only walking to exactly one
>> PTE/PMD/PUD.
>>
>> folio_walk provides access to the pte/pmd/pud (and the referenced folio
>> page because things like KSM need that), however, as part of this series
>> no page table modifications are performed by users.
>>
>> We might be able to convert some other walk_page_range() users that really
>> only walk to one address, such as DAMON with
>> damon_mkold_ops/damon_young_ops. It might make sense to extend folio_walk
>> in the future to optionally fault in a folio (if applicable), such that we
>> can replace some get_user_pages() users that really only want to lookup
>> a single page/folio under PTL without unconditionally grabbing a folio
>> reference.
>>
>> I have plans to extend the approach to a range walker that will try
>> batching various page table entries (not just folio pages) to be a better
>> replace for walk_page_range() -- and users will be able to opt in which
>> type of page table entries they want to process -- but that will require
>> more work and more thoughts.
>>
>> KSM seems to work just fine (ksm_functional_tests selftests) and
>> move_pages seems to work (migration selftest). I tested the leaf
>> implementation excessively using various hugetlb sizes (64K, 2M, 32M, 1G)
>> on arm64 using move_pages and did some more testing on x86-64. Cross
>> compiled on a bunch of architectures.
>>
>> I am not able to test the s390x Secure Execution changes, unfortunately.
> 
> The whole series looks good to me, but I do not feel confident enough
> about all the folio details to actually r-b any of the non-s390
> patches. (I do have a few questions, though)
> 
> As for the s390 patches: they look fine. I have tested the series on
> s390 and nothing caught fire.
> 
> We will be able to get more CI coverage once this lands in -next.

Thanks for the review! Note that it's already in -next.