Message ID | 20250131143749.1435006-1-rcn@igalia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm,madvise,hugetlb: check for 0-length range after end address adjustment | expand |
On Fri, 31 Jan 2025 15:37:49 +0100 Ricardo Cañuelo Navarro <rcn@igalia.com> wrote: > Add a sanity check to madvise_dontneed_free() to address a corner case > in madvise where a race condition causes the current vma being processed > to be backed by a different page size. > > During a madvise(MADV_DONTNEED) call on a memory region registered with > a userfaultfd, there's a period of time where the process mm lock is > temporarily released in order to send a UFFD_EVENT_REMOVE and let > userspace handle the event. During this time, the vma covering the > current address range may change due to an explicit mmap done > concurrently by another thread. > > If, after that change, the memory region, which was originally backed by > 4KB pages, is now backed by hugepages, the end address is rounded down > to a hugepage boundary to avoid data loss (see "Fixes" below). This > rounding may cause the end address to be truncated to the same address > as the start. > > Make this corner case follow the same semantics as in other similar > cases where the requested region has zero length (ie. return 0). > > This will make madvise_walk_vmas() continue to the next vma in the > range (this time holding the process mm lock) which, due to the prev > pointer becoming stale because of the vma change, will be the same > hugepage-backed vma that was just checked before. The next time > madvise_dontneed_free() runs for this vma, if the start address isn't > aligned to a hugepage boundary, it'll return -EINVAL, which is also in > line with the madvise api. > > >From userspace perspective, madvise() will return EINVAL because the > start address isn't aligned according to the new vma alignment > requirements (hugepage), even though it was correctly page-aligned when > the call was issued. > > ... > > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -933,7 +933,9 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, > */ > end = vma->vm_end; > } > - VM_WARN_ON(start >= end); > + if (start == end) > + return 0; > + VM_WARN_ON(start > end); > } Perhaps add a comment telling the user how this situation can come about?
Hi Andrew,
On Fri, Jan 31 2025 at 14:23:21, Andrew Morton <akpm@linux-foundation.org> wrote:
> Perhaps add a comment telling the user how this situation can come about?
Sure, I'll add the description to v2. Thanks for reviewing it!
Ricardo
diff --git a/mm/madvise.c b/mm/madvise.c index 49f3a75046f6..c8e28d51978a 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -933,7 +933,9 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, */ end = vma->vm_end; } - VM_WARN_ON(start >= end); + if (start == end) + return 0; + VM_WARN_ON(start > end); } if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED)
Add a sanity check to madvise_dontneed_free() to address a corner case in madvise where a race condition causes the current vma being processed to be backed by a different page size. During a madvise(MADV_DONTNEED) call on a memory region registered with a userfaultfd, there's a period of time where the process mm lock is temporarily released in order to send a UFFD_EVENT_REMOVE and let userspace handle the event. During this time, the vma covering the current address range may change due to an explicit mmap done concurrently by another thread. If, after that change, the memory region, which was originally backed by 4KB pages, is now backed by hugepages, the end address is rounded down to a hugepage boundary to avoid data loss (see "Fixes" below). This rounding may cause the end address to be truncated to the same address as the start. Make this corner case follow the same semantics as in other similar cases where the requested region has zero length (ie. return 0). This will make madvise_walk_vmas() continue to the next vma in the range (this time holding the process mm lock) which, due to the prev pointer becoming stale because of the vma change, will be the same hugepage-backed vma that was just checked before. The next time madvise_dontneed_free() runs for this vma, if the start address isn't aligned to a hugepage boundary, it'll return -EINVAL, which is also in line with the madvise api. From userspace perspective, madvise() will return EINVAL because the start address isn't aligned according to the new vma alignment requirements (hugepage), even though it was correctly page-aligned when the call was issued. Fixes: 8ebe0a5eaaeb ("mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs") Cc: stable@vger.kernel.org Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com> --- mm/madvise.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)