Message ID | 20240725183955.2268884-1-david@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | mm/hugetlb: fix hugetlb vs. core-mm PT locking | expand |
On Thu, 25 Jul 2024 20:39:53 +0200 David Hildenbrand <david@redhat.com> wrote: > Working on another generic page table walker that tries to avoid > special-casing hugetlb, I found a page table locking issue with hugetlb > folios that are not mapped using a single PMD/PUD. > > For some hugetlb folio sizes, GUP will take different page table locks > when walking the page tables than hugetlb when modifying the page tables. > > I did not actually try reproducing an issue, but looking at > follow_pmd_mask() where we might be rereading a PMD value multiple times > it's rather clear that concurrent modifications are rather unpleasant. > > In follow_page_pte() we might be better in that regard -- ptep_get() does > a READ_ONCE() -- but who knows what else could happen concurrently in > some weird corner cases (e.g., hugetlb folio getting unmapped and freed). > > Did some basic sanity testing with various hugetlb sizes on x86-64 and > arm64. Maybe I'll find some time to actually write a simple reproducer in > the common weeks, so this wouldn't have to be all-theoretical for now. When can we be confident that this change is merge-worthy? > Only v6.10 is affected, so the #1 can be simply backported as a prereq > patch along with the real fix. I'll add the same Fixes: to [1/2], and cc:stable.
On 25.07.24 22:41, Andrew Morton wrote: > On Thu, 25 Jul 2024 20:39:53 +0200 David Hildenbrand <david@redhat.com> wrote: > >> Working on another generic page table walker that tries to avoid >> special-casing hugetlb, I found a page table locking issue with hugetlb >> folios that are not mapped using a single PMD/PUD. >> >> For some hugetlb folio sizes, GUP will take different page table locks >> when walking the page tables than hugetlb when modifying the page tables. >> >> I did not actually try reproducing an issue, but looking at >> follow_pmd_mask() where we might be rereading a PMD value multiple times >> it's rather clear that concurrent modifications are rather unpleasant. >> >> In follow_page_pte() we might be better in that regard -- ptep_get() does >> a READ_ONCE() -- but who knows what else could happen concurrently in >> some weird corner cases (e.g., hugetlb folio getting unmapped and freed). >> >> Did some basic sanity testing with various hugetlb sizes on x86-64 and >> arm64. Maybe I'll find some time to actually write a simple reproducer in >> the common weeks, so this wouldn't have to be all-theoretical for now. > > When can we be confident that this change is merge-worthy? I'm convinced that it is the right thing to do, but I don't think we have to rush this. As Baolin notes, we fixed the same issue in the past, unfortunately also without a reproducer IIUC, so I'll try to reproduce the race, but I'm not 100% sure if I'll manage to do so.. So it's certainly merge-worthy after it had a bit of exposure to -next, but no need to rush this upstream. Thanks!
On 26.07.24 11:19, David Hildenbrand wrote: > On 25.07.24 22:41, Andrew Morton wrote: >> On Thu, 25 Jul 2024 20:39:53 +0200 David Hildenbrand <david@redhat.com> wrote: >> >>> Working on another generic page table walker that tries to avoid >>> special-casing hugetlb, I found a page table locking issue with hugetlb >>> folios that are not mapped using a single PMD/PUD. >>> >>> For some hugetlb folio sizes, GUP will take different page table locks >>> when walking the page tables than hugetlb when modifying the page tables. >>> >>> I did not actually try reproducing an issue, but looking at >>> follow_pmd_mask() where we might be rereading a PMD value multiple times >>> it's rather clear that concurrent modifications are rather unpleasant. >>> >>> In follow_page_pte() we might be better in that regard -- ptep_get() does >>> a READ_ONCE() -- but who knows what else could happen concurrently in >>> some weird corner cases (e.g., hugetlb folio getting unmapped and freed). >>> >>> Did some basic sanity testing with various hugetlb sizes on x86-64 and >>> arm64. Maybe I'll find some time to actually write a simple reproducer in >>> the common weeks, so this wouldn't have to be all-theoretical for now. >> >> When can we be confident that this change is merge-worthy? > > I'm convinced that it is the right thing to do, but I don't think we > have to rush this. > > As Baolin notes, we fixed the same issue in the past, unfortunately also > without a reproducer IIUC, so I'll try to reproduce the race, but I'm > not 100% sure if I'll manage to do so.. Okay, so running this series against the reproducer I pulled out of my magic hat, this series seems to properly fix it.