Message ID | 20170517171639.14501-1-ross.zwisler@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 05/17/2017 10:16 AM, Ross Zwisler wrote: > @@ -3061,7 +3061,7 @@ static int pte_alloc_one_map(struct vm_fault *vmf) > * through an atomic read in C, which is what pmd_trans_unstable() > * provides. > */ > - if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd)) > + if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd)) > return VM_FAULT_NOPAGE; I'm worried we are very unlikely to get this right in the future. It's totally not obvious what the ordering requirement is here. Could we move pmd_devmap() and pmd_trans_unstable() into a helper that gets the ordering right and also spells out the ordering requirement?
On Wed, May 17, 2017 at 10:33:58AM -0700, Dave Hansen wrote: > On 05/17/2017 10:16 AM, Ross Zwisler wrote: > > @@ -3061,7 +3061,7 @@ static int pte_alloc_one_map(struct vm_fault *vmf) > > * through an atomic read in C, which is what pmd_trans_unstable() > > * provides. > > */ > > - if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd)) > > + if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd)) > > return VM_FAULT_NOPAGE; > > I'm worried we are very unlikely to get this right in the future. It's > totally not obvious what the ordering requirement is here. > > Could we move pmd_devmap() and pmd_trans_unstable() into a helper that > gets the ordering right and also spells out the ordering requirement? Sure, I'll fix this for v2. Thanks for the review.
On Wed 17-05-17 11:16:38, Ross Zwisler wrote: > When the pmd_devmap() checks were added by: > > commit 5c7fb56e5e3f ("mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd") > > to add better support for DAX huge pages, they were all added to the end of > if() statements after existing pmd_trans_huge() checks. So, things like: > > - if (pmd_trans_huge(*pmd)) > + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) > > When further checks were added after pmd_trans_unstable() checks by: > > commit 7267ec008b5c ("mm: postpone page table allocation until we have page > to map") > > they were also added at the end of the conditional: > > + if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd)) > > This ordering is fine for pmd_trans_huge(), but doesn't work for > pmd_trans_unstable(). This is because DAX huge pages trip the bad_pmd() > check inside of pmd_none_or_trans_huge_or_clear_bad() (called by > pmd_trans_unstable()), which prints out a warning and returns 1. So, we do > end up doing the right thing, but only after spamming dmesg with suspicious > looking messages: > > mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5) > > Reorder these checks so that pmd_devmap() is checked first, avoiding the > error messages. > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> > Fixes: commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map") > Cc: stable@vger.kernel.org With the change requested by Dave this looks good to me. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > mm/memory.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 6ff5d72..1ee269d 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3061,7 +3061,7 @@ static int pte_alloc_one_map(struct vm_fault *vmf) > * through an atomic read in C, which is what pmd_trans_unstable() > * provides. > */ > - if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd)) > + if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd)) > return VM_FAULT_NOPAGE; > > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, > @@ -3690,7 +3690,7 @@ static int handle_pte_fault(struct vm_fault *vmf) > vmf->pte = NULL; > } else { > /* See comment in pte_alloc_one_map() */ > - if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd)) > + if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd)) > return 0; > /* > * A regular pmd is established and it can't morph into a huge > -- > 2.9.4 >
diff --git a/mm/memory.c b/mm/memory.c index 6ff5d72..1ee269d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3061,7 +3061,7 @@ static int pte_alloc_one_map(struct vm_fault *vmf) * through an atomic read in C, which is what pmd_trans_unstable() * provides. */ - if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd)) + if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd)) return VM_FAULT_NOPAGE; vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, @@ -3690,7 +3690,7 @@ static int handle_pte_fault(struct vm_fault *vmf) vmf->pte = NULL; } else { /* See comment in pte_alloc_one_map() */ - if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd)) + if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd)) return 0; /* * A regular pmd is established and it can't morph into a huge
When the pmd_devmap() checks were added by: commit 5c7fb56e5e3f ("mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge pages, they were all added to the end of if() statements after existing pmd_trans_huge() checks. So, things like: - if (pmd_trans_huge(*pmd)) + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) When further checks were added after pmd_trans_unstable() checks by: commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map") they were also added at the end of the conditional: + if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd)) This ordering is fine for pmd_trans_huge(), but doesn't work for pmd_trans_unstable(). This is because DAX huge pages trip the bad_pmd() check inside of pmd_none_or_trans_huge_or_clear_bad() (called by pmd_trans_unstable()), which prints out a warning and returns 1. So, we do end up doing the right thing, but only after spamming dmesg with suspicious looking messages: mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5) Reorder these checks so that pmd_devmap() is checked first, avoiding the error messages. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Fixes: commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map") Cc: stable@vger.kernel.org --- mm/memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)