Message ID | 20200508192009.15302-1-rcampbell@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | nouveau/hmm: add support for mapping large pages | expand |
On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote: > hmm_range_fault() returns an array of page frame numbers and flags for > how the pages are mapped in the requested process' page tables. The PFN > can be used to get the struct page with hmm_pfn_to_page() and the page size > order can be determined with compound_order(page) but if the page is larger > than order 0 (PAGE_SIZE), there is no indication that the page is mapped > using a larger page size. To be fully general, hmm_range_fault() would need > to return the mapping size to handle cases like a 1GB compound page being > mapped with 2MB PMD entries. However, the most common case is the mapping > size the same as the underlying compound page size. > This series adds a new output flag to indicate this so that callers know it > is safe to use a large device page table mapping if one is available. > Nouveau and the HMM tests are updated to use the new flag. This explanation doesn't make any sense. It doesn't matter how somebody else has it mapped; if it's a PMD-sized page, you can map it with a 2MB mapping.
On 5/8/20 12:59 PM, Matthew Wilcox wrote: > On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote: >> hmm_range_fault() returns an array of page frame numbers and flags for >> how the pages are mapped in the requested process' page tables. The PFN >> can be used to get the struct page with hmm_pfn_to_page() and the page size >> order can be determined with compound_order(page) but if the page is larger >> than order 0 (PAGE_SIZE), there is no indication that the page is mapped >> using a larger page size. To be fully general, hmm_range_fault() would need >> to return the mapping size to handle cases like a 1GB compound page being >> mapped with 2MB PMD entries. However, the most common case is the mapping >> size the same as the underlying compound page size. >> This series adds a new output flag to indicate this so that callers know it >> is safe to use a large device page table mapping if one is available. >> Nouveau and the HMM tests are updated to use the new flag. > > This explanation doesn't make any sense. It doesn't matter how somebody > else has it mapped; if it's a PMD-sized page, you can map it with a > 2MB mapping. > Sure, the I/O will work OK, but is it safe? Copy on write isn't an issue? splitting a PMD in one process due to mprotect of a shared page will cause other process' page tables to be split the same way? Recall that these are system memory pages that could be THPs, shmem, hugetlbfs, mmap shared file pages, etc.
On Fri, May 08, 2020 at 01:17:55PM -0700, Ralph Campbell wrote: > On 5/8/20 12:59 PM, Matthew Wilcox wrote: > > On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote: > > > hmm_range_fault() returns an array of page frame numbers and flags for > > > how the pages are mapped in the requested process' page tables. The PFN > > > can be used to get the struct page with hmm_pfn_to_page() and the page size > > > order can be determined with compound_order(page) but if the page is larger > > > than order 0 (PAGE_SIZE), there is no indication that the page is mapped > > > using a larger page size. To be fully general, hmm_range_fault() would need > > > to return the mapping size to handle cases like a 1GB compound page being > > > mapped with 2MB PMD entries. However, the most common case is the mapping > > > size the same as the underlying compound page size. > > > This series adds a new output flag to indicate this so that callers know it > > > is safe to use a large device page table mapping if one is available. > > > Nouveau and the HMM tests are updated to use the new flag. > > > > This explanation doesn't make any sense. It doesn't matter how somebody > > else has it mapped; if it's a PMD-sized page, you can map it with a > > 2MB mapping. > > Sure, the I/O will work OK, but is it safe? > Copy on write isn't an issue? splitting a PMD in one process due to > mprotect of a shared page will cause other process' page tables to be split > the same way? Are you saying that if you call this function on an address range of a process which has done COW of a single page in the middle of a THP, you want to return with this flag clear, but if the THP is still intact, you want to set this flag? > Recall that these are system memory pages that could be THPs, shmem, hugetlbfs, > mmap shared file pages, etc.
On 5/8/20 8:17 PM, Matthew Wilcox wrote: > On Fri, May 08, 2020 at 01:17:55PM -0700, Ralph Campbell wrote: >> On 5/8/20 12:59 PM, Matthew Wilcox wrote: >>> On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote: >>>> hmm_range_fault() returns an array of page frame numbers and flags for >>>> how the pages are mapped in the requested process' page tables. The PFN >>>> can be used to get the struct page with hmm_pfn_to_page() and the page size >>>> order can be determined with compound_order(page) but if the page is larger >>>> than order 0 (PAGE_SIZE), there is no indication that the page is mapped >>>> using a larger page size. To be fully general, hmm_range_fault() would need >>>> to return the mapping size to handle cases like a 1GB compound page being >>>> mapped with 2MB PMD entries. However, the most common case is the mapping >>>> size the same as the underlying compound page size. >>>> This series adds a new output flag to indicate this so that callers know it >>>> is safe to use a large device page table mapping if one is available. >>>> Nouveau and the HMM tests are updated to use the new flag. >>> >>> This explanation doesn't make any sense. It doesn't matter how somebody >>> else has it mapped; if it's a PMD-sized page, you can map it with a >>> 2MB mapping. >> >> Sure, the I/O will work OK, but is it safe? >> Copy on write isn't an issue? splitting a PMD in one process due to >> mprotect of a shared page will cause other process' page tables to be split >> the same way? > > Are you saying that if you call this function on an address range of a > process which has done COW of a single page in the middle of a THP, > you want to return with this flag clear, but if the THP is still intact, > you want to set this flag? Correct. I want the GPU to see the same faults that the CPU would see when trying to access the same addresses. All faults, whether from CPU or GPU, end up calling handle_mm_fault() to handle the fault and update the GPU/CPU page tables. >> Recall that these are system memory pages that could be THPs, shmem, hugetlbfs, >> mmap shared file pages, etc.
On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote: > hmm_range_fault() returns an array of page frame numbers and flags for > how the pages are mapped in the requested process' page tables. The PFN > can be used to get the struct page with hmm_pfn_to_page() and the page size > order can be determined with compound_order(page) but if the page is larger > than order 0 (PAGE_SIZE), there is no indication that the page is mapped > using a larger page size. To be fully general, hmm_range_fault() would need > to return the mapping size to handle cases like a 1GB compound page being > mapped with 2MB PMD entries. However, the most common case is the mapping > size the same as the underlying compound page size. > This series adds a new output flag to indicate this so that callers know it > is safe to use a large device page table mapping if one is available. > Nouveau and the HMM tests are updated to use the new flag. > > Note that this series depends on a patch queued in Ben Skeggs' nouveau > tree ("nouveau/hmm: map pages after migration") and the patches queued > in Jason's HMM tree. > There is also a patch outstanding ("nouveau/hmm: fix nouveau_dmem_chunk > allocations") that is independent of the above and could be applied > before or after. Did Christoph and Matt's remarks get addressed here? I think ODP could use something like this, currently it checks every page to get back to the huge page size and this flag would optimze that Jason
On 5/25/20 6:41 AM, Jason Gunthorpe wrote: > On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote: >> hmm_range_fault() returns an array of page frame numbers and flags for >> how the pages are mapped in the requested process' page tables. The PFN >> can be used to get the struct page with hmm_pfn_to_page() and the page size >> order can be determined with compound_order(page) but if the page is larger >> than order 0 (PAGE_SIZE), there is no indication that the page is mapped >> using a larger page size. To be fully general, hmm_range_fault() would need >> to return the mapping size to handle cases like a 1GB compound page being >> mapped with 2MB PMD entries. However, the most common case is the mapping >> size the same as the underlying compound page size. >> This series adds a new output flag to indicate this so that callers know it >> is safe to use a large device page table mapping if one is available. >> Nouveau and the HMM tests are updated to use the new flag. >> >> Note that this series depends on a patch queued in Ben Skeggs' nouveau >> tree ("nouveau/hmm: map pages after migration") and the patches queued >> in Jason's HMM tree. >> There is also a patch outstanding ("nouveau/hmm: fix nouveau_dmem_chunk >> allocations") that is independent of the above and could be applied >> before or after. > > Did Christoph and Matt's remarks get addressed here? Both questioned the need to add the HMM_PFN_COMPOUND flag to the hmm_range_fault() output array saying that the PFN can be used to get the struct page pointer and the page can be examined to determine the page size. My response is that while is true, it is also important that the device only access the same parts of a large page that the process/cpu has access to. There are places where a large page is mapped with smaller page table entries when a page is shared by multiple processes. After I explained this, I haven't seen any further comments from Christoph and Matt. I'm still looking for reviews, acks, or suggested changes. > I think ODP could use something like this, currently it checks every > page to get back to the huge page size and this flag would optimze > that > > Jason
On Tue, May 26, 2020 at 10:32:48AM -0700, Ralph Campbell wrote: > > On 5/25/20 6:41 AM, Jason Gunthorpe wrote: > > On Fri, May 08, 2020 at 12:20:03PM -0700, Ralph Campbell wrote: > > > hmm_range_fault() returns an array of page frame numbers and flags for > > > how the pages are mapped in the requested process' page tables. The PFN > > > can be used to get the struct page with hmm_pfn_to_page() and the page size > > > order can be determined with compound_order(page) but if the page is larger > > > than order 0 (PAGE_SIZE), there is no indication that the page is mapped > > > using a larger page size. To be fully general, hmm_range_fault() would need > > > to return the mapping size to handle cases like a 1GB compound page being > > > mapped with 2MB PMD entries. However, the most common case is the mapping > > > size the same as the underlying compound page size. > > > This series adds a new output flag to indicate this so that callers know it > > > is safe to use a large device page table mapping if one is available. > > > Nouveau and the HMM tests are updated to use the new flag. > > > > > > Note that this series depends on a patch queued in Ben Skeggs' nouveau > > > tree ("nouveau/hmm: map pages after migration") and the patches queued > > > in Jason's HMM tree. > > > There is also a patch outstanding ("nouveau/hmm: fix nouveau_dmem_chunk > > > allocations") that is independent of the above and could be applied > > > before or after. > > > > Did Christoph and Matt's remarks get addressed here? > > Both questioned the need to add the HMM_PFN_COMPOUND flag to the > hmm_range_fault() output array saying that the PFN can be used to get the > struct page pointer and the page can be examined to determine the page size. > My response is that while is true, it is also important that the device only > access the same parts of a large page that the process/cpu has access to. > There are places where a large page is mapped with smaller page table entries > when a page is shared by multiple processes. > After I explained this, I haven't seen any further comments from Christoph > and Matt. I'm still looking for reviews, acks, or suggested changes. Okay, well, we reached the merge window, so since there may be some conflicts repost again in three weeks. It would be more compelling if there was some performance data if it is much of a win vs the 'compute large page' algorithm something like ODP uses. Jason