mbox series

[RFC,00/11] THP support for zone device pages

Message ID 20250306044239.3874247-1-balbirs@nvidia.com (mailing list archive)
Headers show
Series THP support for zone device pages | expand

Message

Balbir Singh March 6, 2025, 4:42 a.m. UTC
This patch series adds support for THP migration of zone device pages.
To do so, the patches implement support for folio zone device pages
by adding support for setting up larger order pages.

These patches build on the earlier posts by Ralph Campbell [1]

Two new flags are added in vma_migration to select and mark compound pages.
migrate_vma_setup(), migrate_vma_pages() and migrate_vma_finalize()
support migration of these pages when MIGRATE_VMA_SELECT_COMPOUND
is passed in as arguments.

The series also adds zone device awareness to (m)THP pages along
with fault handling of large zone device private pages. page vma walk
and the rmap code is also zone device aware. Support has also been
added for folios that might need to be split in the middle
of migration (when the src and dst do not agree on
MIGRATE_PFN_COMPOUND), that occurs when src side of the migration can
migrate large pages, but the destination has not been able to allocate
large pages. The code supported and used folio_split() when migrating
THP pages, this is used when MIGRATE_VMA_SELECT_COMPOUND is not passed
as an argument to migrate_vma_setup().

The test infrastructure lib/test_hmm.c has been enhanced to support THP
migration. A new ioctl to emulate failure of large page allocations has
been added to test the folio split code path. hmm-tests.c has new test
cases for huge page migration and to test the folio split path.

The nouveau dmem code has been enhanced to use the new THP migration
capability.

mTHP support:

The patches hard code, HPAGE_PMD_NR in a few places, but the code has
been kept generic to support various order sizes. With additional
refactoring of the code support of different order sizes should be
possible.

References:
[1] https://lore.kernel.org/linux-mm/20201106005147.20113-1-rcampbell@nvidia.com/

These patches are built on top of mm-everything-2025-03-04-05-51

Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Donet Tom <donettom@linux.ibm.com>

Balbir Singh (11):
  mm/zone_device: support large zone device private folios
  mm/migrate_device: flags for selecting device private THP pages
  mm/thp: zone_device awareness in THP handling code
  mm/migrate_device: THP migration of zone device pages
  mm/memory/fault: Add support for zone device THP fault handling
  lib/test_hmm: test cases and support for zone device private THP
  mm/memremap: Add folio_split support
  mm/thp: add split during migration support
  lib/test_hmm: add test case for split pages
  selftests/mm/hmm-tests: new tests for zone device THP migration
  gpu/drm/nouveau: Add THP migration support

 drivers/gpu/drm/nouveau/nouveau_dmem.c | 244 +++++++++----
 drivers/gpu/drm/nouveau/nouveau_svm.c  |   6 +-
 drivers/gpu/drm/nouveau/nouveau_svm.h  |   3 +-
 include/linux/huge_mm.h                |  18 +-
 include/linux/memremap.h               |  29 +-
 include/linux/migrate.h                |   2 +
 include/linux/mm.h                     |   1 +
 lib/test_hmm.c                         | 387 ++++++++++++++++----
 lib/test_hmm_uapi.h                    |   3 +
 mm/huge_memory.c                       | 242 +++++++++---
 mm/memory.c                            |   6 +-
 mm/memremap.c                          |  50 ++-
 mm/migrate.c                           |   2 +
 mm/migrate_device.c                    | 488 +++++++++++++++++++++----
 mm/page_alloc.c                        |   1 +
 mm/page_vma_mapped.c                   |  10 +
 mm/rmap.c                              |  19 +-
 tools/testing/selftests/mm/hmm-tests.c | 407 +++++++++++++++++++++
 18 files changed, 1630 insertions(+), 288 deletions(-)

Comments

Matthew Brost March 6, 2025, 11:08 p.m. UTC | #1
On Thu, Mar 06, 2025 at 03:42:28PM +1100, Balbir Singh wrote:

This is an exciting series to see. As of today, we have just merged this
series into the DRM subsystem / Xe [2], which adds very basic SVM
support. One of the performance bottlenecks we quickly identified was
the lack of THP for device pages—I believe our profiling showed that 96%
of the time spent on 2M page GPU faults was within the migrate_vma_*
functions. Presumably, this will help significantly.

We will likely attempt to pull this code into GPU SVM / Xe fairly soon.
I believe we will encounter a conflict since [2] includes these patches
[3] [4], but we should be able to resolve that. These patches might make
it into the 6.15 PR — TBD but I can get back to you on that.

I have one question—does this series contain all the required core MM
changes for us to give it a try? That is, do I need to include any other
code from the list to test this out?

Matt

[2] https://patchwork.freedesktop.org/series/137870/
[3] https://patchwork.freedesktop.org/patch/641207/?series=137870&rev=8
[4] https://patchwork.freedesktop.org/patch/641214/?series=137870&rev=8

> This patch series adds support for THP migration of zone device pages.
> To do so, the patches implement support for folio zone device pages
> by adding support for setting up larger order pages.
> 
> These patches build on the earlier posts by Ralph Campbell [1]
> 
> Two new flags are added in vma_migration to select and mark compound pages.
> migrate_vma_setup(), migrate_vma_pages() and migrate_vma_finalize()
> support migration of these pages when MIGRATE_VMA_SELECT_COMPOUND
> is passed in as arguments.
> 
> The series also adds zone device awareness to (m)THP pages along
> with fault handling of large zone device private pages. page vma walk
> and the rmap code is also zone device aware. Support has also been
> added for folios that might need to be split in the middle
> of migration (when the src and dst do not agree on
> MIGRATE_PFN_COMPOUND), that occurs when src side of the migration can
> migrate large pages, but the destination has not been able to allocate
> large pages. The code supported and used folio_split() when migrating
> THP pages, this is used when MIGRATE_VMA_SELECT_COMPOUND is not passed
> as an argument to migrate_vma_setup().
> 
> The test infrastructure lib/test_hmm.c has been enhanced to support THP
> migration. A new ioctl to emulate failure of large page allocations has
> been added to test the folio split code path. hmm-tests.c has new test
> cases for huge page migration and to test the folio split path.
> 
> The nouveau dmem code has been enhanced to use the new THP migration
> capability.
> 
> mTHP support:
> 
> The patches hard code, HPAGE_PMD_NR in a few places, but the code has
> been kept generic to support various order sizes. With additional
> refactoring of the code support of different order sizes should be
> possible.
> 
> References:
> [1] https://lore.kernel.org/linux-mm/20201106005147.20113-1-rcampbell@nvidia.com/
> 
> These patches are built on top of mm-everything-2025-03-04-05-51
> 
> Cc: Karol Herbst <kherbst@redhat.com>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Simona Vetter <simona@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Barry Song <baohua@kernel.org>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> Cc: Jane Chu <jane.chu@oracle.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: Donet Tom <donettom@linux.ibm.com>
> 
> Balbir Singh (11):
>   mm/zone_device: support large zone device private folios
>   mm/migrate_device: flags for selecting device private THP pages
>   mm/thp: zone_device awareness in THP handling code
>   mm/migrate_device: THP migration of zone device pages
>   mm/memory/fault: Add support for zone device THP fault handling
>   lib/test_hmm: test cases and support for zone device private THP
>   mm/memremap: Add folio_split support
>   mm/thp: add split during migration support
>   lib/test_hmm: add test case for split pages
>   selftests/mm/hmm-tests: new tests for zone device THP migration
>   gpu/drm/nouveau: Add THP migration support
> 
>  drivers/gpu/drm/nouveau/nouveau_dmem.c | 244 +++++++++----
>  drivers/gpu/drm/nouveau/nouveau_svm.c  |   6 +-
>  drivers/gpu/drm/nouveau/nouveau_svm.h  |   3 +-
>  include/linux/huge_mm.h                |  18 +-
>  include/linux/memremap.h               |  29 +-
>  include/linux/migrate.h                |   2 +
>  include/linux/mm.h                     |   1 +
>  lib/test_hmm.c                         | 387 ++++++++++++++++----
>  lib/test_hmm_uapi.h                    |   3 +
>  mm/huge_memory.c                       | 242 +++++++++---
>  mm/memory.c                            |   6 +-
>  mm/memremap.c                          |  50 ++-
>  mm/migrate.c                           |   2 +
>  mm/migrate_device.c                    | 488 +++++++++++++++++++++----
>  mm/page_alloc.c                        |   1 +
>  mm/page_vma_mapped.c                   |  10 +
>  mm/rmap.c                              |  19 +-
>  tools/testing/selftests/mm/hmm-tests.c | 407 +++++++++++++++++++++
>  18 files changed, 1630 insertions(+), 288 deletions(-)
> 
> -- 
> 2.48.1
>
Balbir Singh March 6, 2025, 11:20 p.m. UTC | #2
On 3/7/25 10:08, Matthew Brost wrote:
> On Thu, Mar 06, 2025 at 03:42:28PM +1100, Balbir Singh wrote:
> 
> This is an exciting series to see. As of today, we have just merged this
> series into the DRM subsystem / Xe [2], which adds very basic SVM
> support. One of the performance bottlenecks we quickly identified was
> the lack of THP for device pages—I believe our profiling showed that 96%
> of the time spent on 2M page GPU faults was within the migrate_vma_*
> functions. Presumably, this will help significantly.
> 
> We will likely attempt to pull this code into GPU SVM / Xe fairly soon.
> I believe we will encounter a conflict since [2] includes these patches
> [3] [4], but we should be able to resolve that. These patches might make
> it into the 6.15 PR — TBD but I can get back to you on that.
> 
> I have one question—does this series contain all the required core MM
> changes for us to give it a try? That is, do I need to include any other
> code from the list to test this out?
> 

Thank you, the patches are built on top of mm-everything-2025-03-04-05-51, which
includes changes by Alistair to fix fs/dax reference counting and changes
By Zi Yan (folio split changes), the series builds on top of those, but the
patches are not dependent on the folio split changes, IIRC

Please do report bugs/issues that you come across.

Balbir