diff mbox

[4/5] mm: fix cache mode of dax pmd mappings

Message ID 147318058165.30325.16762406881120129093.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dan Williams Sept. 6, 2016, 4:49 p.m. UTC
track_pfn_insert() is marking dax mappings as uncacheable.

It is used to keep mappings attributes consistent across a remapped range.
However, since dax regions are never registered via track_pfn_remap(), the
caching mode lookup for dax pfns always returns _PAGE_CACHE_MODE_UC.  We do not
use track_pfn_insert() in the dax-pte path, and we always want to use the
pgprot of the vma itself, so drop this call.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Reported-by: Kai Zhang <kai.ka.zhang@oracle.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 mm/huge_memory.c |    2 --
 1 file changed, 2 deletions(-)

Comments

Andrew Morton Sept. 6, 2016, 8:17 p.m. UTC | #1
On Tue, 06 Sep 2016 09:49:41 -0700 Dan Williams <dan.j.williams@intel.com> wrote:

> track_pfn_insert() is marking dax mappings as uncacheable.
> 
> It is used to keep mappings attributes consistent across a remapped range.
> However, since dax regions are never registered via track_pfn_remap(), the
> caching mode lookup for dax pfns always returns _PAGE_CACHE_MODE_UC.  We do not
> use track_pfn_insert() in the dax-pte path, and we always want to use the
> pgprot of the vma itself, so drop this call.
> 
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
> Reported-by: Kai Zhang <kai.ka.zhang@oracle.com>
> Reported-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Changelog fails to explain the user-visible effects of the patch.  The
stable maintainer(s) will look at this and wonder "ytf was I sent
this".

After fixing that, 

Acked-by: Andrew Morton <akpm@linux-foundation.org>
Dan Williams Sept. 6, 2016, 9:52 p.m. UTC | #2
On Tue, Sep 6, 2016 at 1:17 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 06 Sep 2016 09:49:41 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
>
>> track_pfn_insert() is marking dax mappings as uncacheable.
>>
>> It is used to keep mappings attributes consistent across a remapped range.
>> However, since dax regions are never registered via track_pfn_remap(), the
>> caching mode lookup for dax pfns always returns _PAGE_CACHE_MODE_UC.  We do not
>> use track_pfn_insert() in the dax-pte path, and we always want to use the
>> pgprot of the vma itself, so drop this call.
>>
>> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
>> Cc: Matthew Wilcox <mawilcox@microsoft.com>
>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
>> Reported-by: Kai Zhang <kai.ka.zhang@oracle.com>
>> Reported-by: Toshi Kani <toshi.kani@hpe.com>
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> Changelog fails to explain the user-visible effects of the patch.  The
> stable maintainer(s) will look at this and wonder "ytf was I sent
> this".

True, I'll change it to this:

track_pfn_insert() is marking dax mappings as uncacheable rendering
them impractical for application usage.  DAX-pte mappings are cached
and the goal of establishing DAX-pmd mappings is to attain more
performance, not dramatically less (3 orders of magnitude).

Deleting the call to track_pfn_insert() in vmf_insert_pfn_pmd() lets
the default pgprot (write-back cache enabled) from the vma be used for
the mapping which yields the expected performance improvement over
DAX-pte mappings.

track_pfn_insert() is meant to keep the cache mode for a given range
synchronized across different users of remap_pfn_range() and
vm_insert_pfn_prot().  DAX uses neither of those mapping methods, and
the pmem driver is already marking its memory ranges as write-back
cache enabled.  So, removing the call to track_pfn_insert() leaves the
kernel no worse off than the current situation where a user could map
the range via /dev/mem with an incompatible cache mode compared to the
driver.

> After fixing that,
>
> Acked-by: Andrew Morton <akpm@linux-foundation.org>

Thanks Andrew!
Kani, Toshi Sept. 7, 2016, 7:39 p.m. UTC | #3
On Tue, 2016-09-06 at 14:52 -0700, Dan Williams wrote:
> On Tue, Sep 6, 2016 at 1:17 PM, Andrew Morton <akpm@linux-foundation.

> org> wrote:

> > 

> > On Tue, 06 Sep 2016 09:49:41 -0700 Dan Williams <dan.j.williams@int

> > el.com> wrote:

> > 

> > > 

> > > track_pfn_insert() is marking dax mappings as uncacheable.

> > > 

> > > It is used to keep mappings attributes consistent across a

> > > remapped range. However, since dax regions are never registered

> > > via track_pfn_remap(), the caching mode lookup for dax pfns

> > > always returns _PAGE_CACHE_MODE_UC.  We do not use

> > > track_pfn_insert() in the dax-pte path, and we always want to use

> > > the pgprot of the vma itself, so drop this call.

> > > 

> > > Cc: Ross Zwisler <ross.zwisler@linux.intel.com>

> > > Cc: Matthew Wilcox <mawilcox@microsoft.com>

> > > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

> > > Cc: Andrew Morton <akpm@linux-foundation.org>

> > > Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>

> > > Reported-by: Kai Zhang <kai.ka.zhang@oracle.com>

> > > Reported-by: Toshi Kani <toshi.kani@hpe.com>

> > > Cc: <stable@vger.kernel.org>

> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>

> > 

> > Changelog fails to explain the user-visible effects of the

> > patch.  The stable maintainer(s) will look at this and wonder "ytf

> > was I sent this".

> 

> True, I'll change it to this:

> 

> track_pfn_insert() is marking dax mappings as uncacheable rendering

> them impractical for application usage.  DAX-pte mappings are cached

> and the goal of establishing DAX-pmd mappings is to attain more

> performance, not dramatically less (3 orders of magnitude).

> 

> Deleting the call to track_pfn_insert() in vmf_insert_pfn_pmd() lets

> the default pgprot (write-back cache enabled) from the vma be used

> for the mapping which yields the expected performance improvement

> over DAX-pte mappings.

> 

> track_pfn_insert() is meant to keep the cache mode for a given range

> synchronized across different users of remap_pfn_range() and

> vm_insert_pfn_prot().  DAX uses neither of those mapping methods, and

> the pmem driver is already marking its memory ranges as write-back

> cache enabled.  So, removing the call to track_pfn_insert() leaves

> the kernel no worse off than the current situation where a user could

> map the range via /dev/mem with an incompatible cache mode compared

> to the driver.


I think devm_memremap_pages() should call reserve_memtype() on x86 to
keep it consistent with devm_memremap() on this regard.  We may need an
arch stub for reserve_memtype(), though.  Then, track_pfn_insert()
should have no issue in this case.

Thanks,
-Toshi
Dan Williams Sept. 7, 2016, 7:45 p.m. UTC | #4
On Wed, Sep 7, 2016 at 12:39 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Tue, 2016-09-06 at 14:52 -0700, Dan Williams wrote:
>> On Tue, Sep 6, 2016 at 1:17 PM, Andrew Morton <akpm@linux-foundation.
>> org> wrote:
>> >
>> > On Tue, 06 Sep 2016 09:49:41 -0700 Dan Williams <dan.j.williams@int
>> > el.com> wrote:
>> >
>> > >
>> > > track_pfn_insert() is marking dax mappings as uncacheable.
>> > >
>> > > It is used to keep mappings attributes consistent across a
>> > > remapped range. However, since dax regions are never registered
>> > > via track_pfn_remap(), the caching mode lookup for dax pfns
>> > > always returns _PAGE_CACHE_MODE_UC.  We do not use
>> > > track_pfn_insert() in the dax-pte path, and we always want to use
>> > > the pgprot of the vma itself, so drop this call.
>> > >
>> > > Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
>> > > Cc: Matthew Wilcox <mawilcox@microsoft.com>
>> > > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> > > Cc: Andrew Morton <akpm@linux-foundation.org>
>> > > Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
>> > > Reported-by: Kai Zhang <kai.ka.zhang@oracle.com>
>> > > Reported-by: Toshi Kani <toshi.kani@hpe.com>
>> > > Cc: <stable@vger.kernel.org>
>> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> >
>> > Changelog fails to explain the user-visible effects of the
>> > patch.  The stable maintainer(s) will look at this and wonder "ytf
>> > was I sent this".
>>
>> True, I'll change it to this:
>>
>> track_pfn_insert() is marking dax mappings as uncacheable rendering
>> them impractical for application usage.  DAX-pte mappings are cached
>> and the goal of establishing DAX-pmd mappings is to attain more
>> performance, not dramatically less (3 orders of magnitude).
>>
>> Deleting the call to track_pfn_insert() in vmf_insert_pfn_pmd() lets
>> the default pgprot (write-back cache enabled) from the vma be used
>> for the mapping which yields the expected performance improvement
>> over DAX-pte mappings.
>>
>> track_pfn_insert() is meant to keep the cache mode for a given range
>> synchronized across different users of remap_pfn_range() and
>> vm_insert_pfn_prot().  DAX uses neither of those mapping methods, and
>> the pmem driver is already marking its memory ranges as write-back
>> cache enabled.  So, removing the call to track_pfn_insert() leaves
>> the kernel no worse off than the current situation where a user could
>> map the range via /dev/mem with an incompatible cache mode compared
>> to the driver.
>
> I think devm_memremap_pages() should call reserve_memtype() on x86 to
> keep it consistent with devm_memremap() on this regard.  We may need an
> arch stub for reserve_memtype(), though.  Then, track_pfn_insert()
> should have no issue in this case.

Yes, indeed!  In fact I already have that re-write getting 0day
coverage before posting.  It occurred to me while re-writing the
changelog per Andrew's prompting.
diff mbox

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a6abd76baa72..338eff05c77a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -676,8 +676,6 @@  int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 
 	if (addr < vma->vm_start || addr >= vma->vm_end)
 		return VM_FAULT_SIGBUS;
-	if (track_pfn_insert(vma, &pgprot, pfn))
-		return VM_FAULT_SIGBUS;
 	insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write);
 	return VM_FAULT_NOPAGE;
 }