[0/6] nouveau/hmm: add support for mapping large pages

Message ID	20200508192009.15302-1-rcampbell@nvidia.com (mailing list archive)
Headers	show Return-Path: <SRS0=ekWS=6W=vger.kernel.org=linux-kselftest-owner@kernel.org> TLS: TLSv1.2, DES-CBC3-SHA) id <B5eb5b0e90000>; Fri, 08 May 2020 12:20:09 -0700 From: Ralph Campbell <rcampbell@nvidia.com> To: <nouveau@lists.freedesktop.org>, <linux-rdma@vger.kernel.org>, <linux-mm@kvack.org>, <linux-kselftest@vger.kernel.org>, <linux-kernel@vger.kernel.org> CC: Jerome Glisse <jglisse@redhat.com>, John Hubbard <jhubbard@nvidia.com>, Christoph Hellwig <hch@lst.de>, Jason Gunthorpe <jgg@mellanox.com>, "Ben Skeggs" <bskeggs@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, Shuah Khan <shuah@linuxfoundation.org>, Ralph Campbell <rcampbell@nvidia.com> Subject: [PATCH 0/6] nouveau/hmm: add support for mapping large pages Date: Fri, 8 May 2020 12:20:03 -0700 Message-ID: <20200508192009.15302-1-rcampbell@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk
Series	nouveau/hmm: add support for mapping large pages \| expand [0/6] nouveau/hmm: add support for mapping large pages [1/6] nouveau/hmm: map pages after migration [2/6] nouveau: make nvkm_vmm_ctor() and nvkm_mmu_ptp_get() static [3/6] nouveau/hmm: fault one page at a time [4/6] mm/hmm: add output flag for compound page mapping [5/6] nouveau/hmm: support mapping large sysmem pages [6/6] hmm: add tests for HMM_PFN_COMPOUND flag

Ralph Campbell May 8, 2020, 7:20 p.m. UTC

hmm_range_fault() returns an array of page frame numbers and flags for
how the pages are mapped in the requested process' page tables. The PFN
can be used to get the struct page with hmm_pfn_to_page() and the page size
order can be determined with compound_order(page) but if the page is larger
than order 0 (PAGE_SIZE), there is no indication that the page is mapped
using a larger page size. To be fully general, hmm_range_fault() would need
to return the mapping size to handle cases like a 1GB compound page being
mapped with 2MB PMD entries. However, the most common case is the mapping
size the same as the underlying compound page size.
This series adds a new output flag to indicate this so that callers know it
is safe to use a large device page table mapping if one is available.
Nouveau and the HMM tests are updated to use the new flag.

Note that this series depends on a patch queued in Ben Skeggs' nouveau
tree ("nouveau/hmm: map pages after migration") and the patches queued
in Jason's HMM tree.
There is also a patch outstanding ("nouveau/hmm: fix nouveau_dmem_chunk
allocations") that is independent of the above and could be applied
before or after.


Ralph Campbell (6):
  nouveau/hmm: map pages after migration
  nouveau: make nvkm_vmm_ctor() and nvkm_mmu_ptp_get() static
  nouveau/hmm: fault one page at a time
  mm/hmm: add output flag for compound page mapping
  nouveau/hmm: support mapping large sysmem pages
  hmm: add tests for HMM_PFN_COMPOUND flag

 drivers/gpu/drm/nouveau/nouveau_dmem.c        |  46 ++-
 drivers/gpu/drm/nouveau/nouveau_dmem.h        |   2 +
 drivers/gpu/drm/nouveau/nouveau_svm.c         | 288 +++++++++---------
 drivers/gpu/drm/nouveau/nouveau_svm.h         |   5 +
 .../gpu/drm/nouveau/nvkm/subdev/mmu/base.c    |   6 +-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/priv.h    |   2 +
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |  12 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   3 -
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |  29 +-
 include/linux/hmm.h                           |   4 +-
 lib/test_hmm.c                                |   2 +
 lib/test_hmm_uapi.h                           |   2 +
 mm/hmm.c                                      |  10 +-
 tools/testing/selftests/vm/hmm-tests.c        |  76 +++++
 14 files changed, 311 insertions(+), 176 deletions(-)

Matthew Wilcox May 8, 2020, 7:59 p.m. UTC | #1

On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote:
> hmm_range_fault() returns an array of page frame numbers and flags for
> how the pages are mapped in the requested process' page tables. The PFN
> can be used to get the struct page with hmm_pfn_to_page() and the page size
> order can be determined with compound_order(page) but if the page is larger
> than order 0 (PAGE_SIZE), there is no indication that the page is mapped
> using a larger page size. To be fully general, hmm_range_fault() would need
> to return the mapping size to handle cases like a 1GB compound page being
> mapped with 2MB PMD entries. However, the most common case is the mapping
> size the same as the underlying compound page size.
> This series adds a new output flag to indicate this so that callers know it
> is safe to use a large device page table mapping if one is available.
> Nouveau and the HMM tests are updated to use the new flag.

This explanation doesn't make any sense.  It doesn't matter how somebody
else has it mapped; if it's a PMD-sized page, you can map it with a
2MB mapping.

Ralph Campbell May 8, 2020, 8:17 p.m. UTC | #2

On 5/8/20 12:59 PM, Matthew Wilcox wrote:
> On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote:
>> hmm_range_fault() returns an array of page frame numbers and flags for
>> how the pages are mapped in the requested process' page tables. The PFN
>> can be used to get the struct page with hmm_pfn_to_page() and the page size
>> order can be determined with compound_order(page) but if the page is larger
>> than order 0 (PAGE_SIZE), there is no indication that the page is mapped
>> using a larger page size. To be fully general, hmm_range_fault() would need
>> to return the mapping size to handle cases like a 1GB compound page being
>> mapped with 2MB PMD entries. However, the most common case is the mapping
>> size the same as the underlying compound page size.
>> This series adds a new output flag to indicate this so that callers know it
>> is safe to use a large device page table mapping if one is available.
>> Nouveau and the HMM tests are updated to use the new flag.
> 
> This explanation doesn't make any sense.  It doesn't matter how somebody
> else has it mapped; if it's a PMD-sized page, you can map it with a
> 2MB mapping.
> 

Sure, the I/O will work OK, but is it safe?
Copy on write isn't an issue? splitting a PMD in one process due to
mprotect of a shared page will cause other process' page tables to be split
the same way?
Recall that these are system memory pages that could be THPs, shmem, hugetlbfs,
mmap shared file pages, etc.

Matthew Wilcox May 9, 2020, 3:17 a.m. UTC | #3

On Fri, May 08, 2020 at 01:17:55PM -0700, Ralph Campbell wrote:
> On 5/8/20 12:59 PM, Matthew Wilcox wrote:
> > On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote:
> > > hmm_range_fault() returns an array of page frame numbers and flags for
> > > how the pages are mapped in the requested process' page tables. The PFN
> > > can be used to get the struct page with hmm_pfn_to_page() and the page size
> > > order can be determined with compound_order(page) but if the page is larger
> > > than order 0 (PAGE_SIZE), there is no indication that the page is mapped
> > > using a larger page size. To be fully general, hmm_range_fault() would need
> > > to return the mapping size to handle cases like a 1GB compound page being
> > > mapped with 2MB PMD entries. However, the most common case is the mapping
> > > size the same as the underlying compound page size.
> > > This series adds a new output flag to indicate this so that callers know it
> > > is safe to use a large device page table mapping if one is available.
> > > Nouveau and the HMM tests are updated to use the new flag.
> > 
> > This explanation doesn't make any sense.  It doesn't matter how somebody
> > else has it mapped; if it's a PMD-sized page, you can map it with a
> > 2MB mapping.
> 
> Sure, the I/O will work OK, but is it safe?
> Copy on write isn't an issue? splitting a PMD in one process due to
> mprotect of a shared page will cause other process' page tables to be split
> the same way?

Are you saying that if you call this function on an address range of a
process which has done COW of a single page in the middle of a THP,
you want to return with this flag clear, but if the THP is still intact,
you want to set this flag?

> Recall that these are system memory pages that could be THPs, shmem, hugetlbfs,
> mmap shared file pages, etc.

Ralph Campbell May 11, 2020, 5:07 p.m. UTC | #4

On 5/8/20 8:17 PM, Matthew Wilcox wrote:
> On Fri, May 08, 2020 at 01:17:55PM -0700, Ralph Campbell wrote:
>> On 5/8/20 12:59 PM, Matthew Wilcox wrote:
>>> On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote:
>>>> hmm_range_fault() returns an array of page frame numbers and flags for
>>>> how the pages are mapped in the requested process' page tables. The PFN
>>>> can be used to get the struct page with hmm_pfn_to_page() and the page size
>>>> order can be determined with compound_order(page) but if the page is larger
>>>> than order 0 (PAGE_SIZE), there is no indication that the page is mapped
>>>> using a larger page size. To be fully general, hmm_range_fault() would need
>>>> to return the mapping size to handle cases like a 1GB compound page being
>>>> mapped with 2MB PMD entries. However, the most common case is the mapping
>>>> size the same as the underlying compound page size.
>>>> This series adds a new output flag to indicate this so that callers know it
>>>> is safe to use a large device page table mapping if one is available.
>>>> Nouveau and the HMM tests are updated to use the new flag.
>>>
>>> This explanation doesn't make any sense.  It doesn't matter how somebody
>>> else has it mapped; if it's a PMD-sized page, you can map it with a
>>> 2MB mapping.
>>
>> Sure, the I/O will work OK, but is it safe?
>> Copy on write isn't an issue? splitting a PMD in one process due to
>> mprotect of a shared page will cause other process' page tables to be split
>> the same way?
> 
> Are you saying that if you call this function on an address range of a
> process which has done COW of a single page in the middle of a THP,
> you want to return with this flag clear, but if the THP is still intact,
> you want to set this flag?

Correct. I want the GPU to see the same faults that the CPU would see when trying
to access the same addresses. All faults, whether from CPU or GPU, end up calling
handle_mm_fault() to handle the fault and update the GPU/CPU page tables.

>> Recall that these are system memory pages that could be THPs, shmem, hugetlbfs,
>> mmap shared file pages, etc.

Jason Gunthorpe May 25, 2020, 1:41 p.m. UTC | #5

On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote:
> hmm_range_fault() returns an array of page frame numbers and flags for
> how the pages are mapped in the requested process' page tables. The PFN
> can be used to get the struct page with hmm_pfn_to_page() and the page size
> order can be determined with compound_order(page) but if the page is larger
> than order 0 (PAGE_SIZE), there is no indication that the page is mapped
> using a larger page size. To be fully general, hmm_range_fault() would need
> to return the mapping size to handle cases like a 1GB compound page being
> mapped with 2MB PMD entries. However, the most common case is the mapping
> size the same as the underlying compound page size.
> This series adds a new output flag to indicate this so that callers know it
> is safe to use a large device page table mapping if one is available.
> Nouveau and the HMM tests are updated to use the new flag.
> 
> Note that this series depends on a patch queued in Ben Skeggs' nouveau
> tree ("nouveau/hmm: map pages after migration") and the patches queued
> in Jason's HMM tree.
> There is also a patch outstanding ("nouveau/hmm: fix nouveau_dmem_chunk
> allocations") that is independent of the above and could be applied
> before or after.

Did Christoph and Matt's remarks get addressed here?

I think ODP could use something like this, currently it checks every
page to get back to the huge page size and this flag would optimze
that

Jason

Ralph Campbell May 26, 2020, 5:32 p.m. UTC | #6

On 5/25/20 6:41 AM, Jason Gunthorpe wrote:
> On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote:
>> hmm_range_fault() returns an array of page frame numbers and flags for
>> how the pages are mapped in the requested process' page tables. The PFN
>> can be used to get the struct page with hmm_pfn_to_page() and the page size
>> order can be determined with compound_order(page) but if the page is larger
>> than order 0 (PAGE_SIZE), there is no indication that the page is mapped
>> using a larger page size. To be fully general, hmm_range_fault() would need
>> to return the mapping size to handle cases like a 1GB compound page being
>> mapped with 2MB PMD entries. However, the most common case is the mapping
>> size the same as the underlying compound page size.
>> This series adds a new output flag to indicate this so that callers know it
>> is safe to use a large device page table mapping if one is available.
>> Nouveau and the HMM tests are updated to use the new flag.
>>
>> Note that this series depends on a patch queued in Ben Skeggs' nouveau
>> tree ("nouveau/hmm: map pages after migration") and the patches queued
>> in Jason's HMM tree.
>> There is also a patch outstanding ("nouveau/hmm: fix nouveau_dmem_chunk
>> allocations") that is independent of the above and could be applied
>> before or after.
> 
> Did Christoph and Matt's remarks get addressed here?

Both questioned the need to add the HMM_PFN_COMPOUND flag to the
hmm_range_fault() output array saying that the PFN can be used to get the
struct page pointer and the page can be examined to determine the page size.
My response is that while is true, it is also important that the device only
access the same parts of a large page that the process/cpu has access to.
There are places where a large page is mapped with smaller page table entries
when a page is shared by multiple processes.
After I explained this, I haven't seen any further comments from Christoph
and Matt. I'm still looking for reviews, acks, or suggested changes.


> I think ODP could use something like this, currently it checks every
> page to get back to the huge page size and this flag would optimze
> that
> 
> Jason

Jason Gunthorpe May 29, 2020, 11:24 p.m. UTC | #7

On Tue, May 26, 2020 at 10:32:48AM -0700, Ralph Campbell wrote:
> 
> On 5/25/20 6:41 AM, Jason Gunthorpe wrote:
> > On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote:
> > > hmm_range_fault() returns an array of page frame numbers and flags for
> > > how the pages are mapped in the requested process' page tables. The PFN
> > > can be used to get the struct page with hmm_pfn_to_page() and the page size
> > > order can be determined with compound_order(page) but if the page is larger
> > > than order 0 (PAGE_SIZE), there is no indication that the page is mapped
> > > using a larger page size. To be fully general, hmm_range_fault() would need
> > > to return the mapping size to handle cases like a 1GB compound page being
> > > mapped with 2MB PMD entries. However, the most common case is the mapping
> > > size the same as the underlying compound page size.
> > > This series adds a new output flag to indicate this so that callers know it
> > > is safe to use a large device page table mapping if one is available.
> > > Nouveau and the HMM tests are updated to use the new flag.
> > > 
> > > Note that this series depends on a patch queued in Ben Skeggs' nouveau
> > > tree ("nouveau/hmm: map pages after migration") and the patches queued
> > > in Jason's HMM tree.
> > > There is also a patch outstanding ("nouveau/hmm: fix nouveau_dmem_chunk
> > > allocations") that is independent of the above and could be applied
> > > before or after.
> > 
> > Did Christoph and Matt's remarks get addressed here?
> 
> Both questioned the need to add the HMM_PFN_COMPOUND flag to the
> hmm_range_fault() output array saying that the PFN can be used to get the
> struct page pointer and the page can be examined to determine the page size.
> My response is that while is true, it is also important that the device only
> access the same parts of a large page that the process/cpu has access to.
> There are places where a large page is mapped with smaller page table entries
> when a page is shared by multiple processes.
> After I explained this, I haven't seen any further comments from Christoph
> and Matt. I'm still looking for reviews, acks, or suggested changes.

Okay, well, we reached the merge window, so since there may be some
conflicts repost again in three weeks.

It would be more compelling if there was some performance data if it
is much of a win vs the 'compute large page' algorithm something like
ODP uses.

Jason

[0/6] nouveau/hmm: add support for mapping large pages

Message

Comments