Message ID | 20250226162341.915535-1-bgeffon@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm: fix finish_fault() handling for large folios | expand |
On 2025/2/27 00:23, Brian Geffon wrote: > When handling faults for anon shmem finish_fault() will attempt to install > ptes for the entire folio. Unfortunately if it encounters a single > non-pte_none entry in that range it will bail, even if the pte that > triggered the fault is still pte_none. When this situation happens the > fault will be retried endlessly never making forward progress. > > This patch fixes this behavior and if it detects that a pte in the range > is not pte_none it will fall back to setting a single pte. > > Cc: stable@vger.kernel.org > Cc: Hugh Dickins <hughd@google.com> > Fixes: 43e027e41423 ("mm: memory: extend finish_fault() to support large folio") > Suggested-by: Baolin Wang <baolin.wang@linux.alibaba.com> > Reported-by: Marek Maslanka <mmaslanka@google.com> > Signed-off-by: Brian Geffon <bgeffon@google.com> > --- > mm/memory.c | 15 ++++++++++----- > 1 file changed, 10 insertions(+), 5 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index b4d3d4893267..b6c467fdbfa4 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5183,7 +5183,11 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) && > !(vma->vm_flags & VM_SHARED); > int type, nr_pages; > - unsigned long addr = vmf->address; > + unsigned long addr; > + bool needs_fallback = false; > + > +fallback: > + addr = vmf->address; > > /* Did we COW the page? */ > if (is_cow) > @@ -5222,7 +5226,8 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > * approach also applies to non-anonymous-shmem faults to avoid > * inflating the RSS of the process. > */ > - if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) { > + if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma)) || > + unlikely(needs_fallback)) { Nit: can you align the code? Otherwise look good to me. > nr_pages = 1; > } else if (nr_pages > 1) { > pgoff_t idx = folio_page_idx(folio, page); > @@ -5258,9 +5263,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > ret = VM_FAULT_NOPAGE; > goto unlock; > } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { > - update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); > - ret = VM_FAULT_NOPAGE; > - goto unlock; > + needs_fallback = true; > + pte_unmap_unlock(vmf->pte, vmf->ptl); > + goto fallback; > } > > folio_ref_add(folio, nr_pages - 1);
On Thu, Feb 27, 2025 at 2:34 AM Baolin Wang <baolin.wang@linux.alibaba.com> wrote: > > > > On 2025/2/27 00:23, Brian Geffon wrote: > > When handling faults for anon shmem finish_fault() will attempt to install > > ptes for the entire folio. Unfortunately if it encounters a single > > non-pte_none entry in that range it will bail, even if the pte that > > triggered the fault is still pte_none. When this situation happens the > > fault will be retried endlessly never making forward progress. > > > > This patch fixes this behavior and if it detects that a pte in the range > > is not pte_none it will fall back to setting a single pte. > > > > Cc: stable@vger.kernel.org > > Cc: Hugh Dickins <hughd@google.com> > > Fixes: 43e027e41423 ("mm: memory: extend finish_fault() to support large folio") > > Suggested-by: Baolin Wang <baolin.wang@linux.alibaba.com> > > Reported-by: Marek Maslanka <mmaslanka@google.com> > > Signed-off-by: Brian Geffon <bgeffon@google.com> > > --- > > mm/memory.c | 15 ++++++++++----- > > 1 file changed, 10 insertions(+), 5 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index b4d3d4893267..b6c467fdbfa4 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -5183,7 +5183,11 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > > bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) && > > !(vma->vm_flags & VM_SHARED); > > int type, nr_pages; > > - unsigned long addr = vmf->address; > > + unsigned long addr; > > + bool needs_fallback = false; > > + > > +fallback: > > + addr = vmf->address; > > > > /* Did we COW the page? */ > > if (is_cow) > > @@ -5222,7 +5226,8 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > > * approach also applies to non-anonymous-shmem faults to avoid > > * inflating the RSS of the process. > > */ > > - if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) { > > + if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma)) || > > + unlikely(needs_fallback)) { > > Nit: can you align the code? Otherwise look good to me. I mailed a v3 with adjusted alignment, I'll let Andrew decide which variation he prefers. > > > nr_pages = 1; > > } else if (nr_pages > 1) { > > pgoff_t idx = folio_page_idx(folio, page); > > @@ -5258,9 +5263,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > > ret = VM_FAULT_NOPAGE; > > goto unlock; > > } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { > > - update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); > > - ret = VM_FAULT_NOPAGE; > > - goto unlock; > > + needs_fallback = true; > > + pte_unmap_unlock(vmf->pte, vmf->ptl); > > + goto fallback; > > } > > > > folio_ref_add(folio, nr_pages - 1); Thanks for looking Brian
diff --git a/mm/memory.c b/mm/memory.c index b4d3d4893267..b6c467fdbfa4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5183,7 +5183,11 @@ vm_fault_t finish_fault(struct vm_fault *vmf) bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED); int type, nr_pages; - unsigned long addr = vmf->address; + unsigned long addr; + bool needs_fallback = false; + +fallback: + addr = vmf->address; /* Did we COW the page? */ if (is_cow) @@ -5222,7 +5226,8 @@ vm_fault_t finish_fault(struct vm_fault *vmf) * approach also applies to non-anonymous-shmem faults to avoid * inflating the RSS of the process. */ - if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) { + if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma)) || + unlikely(needs_fallback)) { nr_pages = 1; } else if (nr_pages > 1) { pgoff_t idx = folio_page_idx(folio, page); @@ -5258,9 +5263,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf) ret = VM_FAULT_NOPAGE; goto unlock; } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { - update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); - ret = VM_FAULT_NOPAGE; - goto unlock; + needs_fallback = true; + pte_unmap_unlock(vmf->pte, vmf->ptl); + goto fallback; } folio_ref_add(folio, nr_pages - 1);
When handling faults for anon shmem finish_fault() will attempt to install ptes for the entire folio. Unfortunately if it encounters a single non-pte_none entry in that range it will bail, even if the pte that triggered the fault is still pte_none. When this situation happens the fault will be retried endlessly never making forward progress. This patch fixes this behavior and if it detects that a pte in the range is not pte_none it will fall back to setting a single pte. Cc: stable@vger.kernel.org Cc: Hugh Dickins <hughd@google.com> Fixes: 43e027e41423 ("mm: memory: extend finish_fault() to support large folio") Suggested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reported-by: Marek Maslanka <mmaslanka@google.com> Signed-off-by: Brian Geffon <bgeffon@google.com> --- mm/memory.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)