Message ID | 20240404163642.1125529-1-david@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | s390: page_mapcount(), page_has_private() and PG_arch_1 | expand |
On Thu, Apr 04, 2024 at 06:36:37PM +0200, David Hildenbrand wrote: > On my journey to remove page_mapcount(), I got hooked up on other folio > cleanups that Willy most certainly will enjoy. > > This series removes the s390x usage of: > * page_mapcount() [patches WIP] > * page_has_private() [have patches to remove it] > > ... and makes PG_arch_1 only be set on folio->flags (i.e., never on tail > pages of large folios). > > Further, one "easy" fix upfront. Looks like you didn't see: https://lore.kernel.org/linux-s390/20240322161149.2327518-1-willy@infradead.org/ > ... unfortunately there is one other issue I spotted that I am not > tackling in this series, because I am not 100% sure what we want to > do: the usage of page_ref_freeze()/folio_ref_freeze() in > make_folio_secure() is unsafe. :( > > In make_folio_secure(), we're holding the folio lock, the mmap lock and > the PT lock. So we are protected against concurrent fork(), zap, GUP, > swapin, migration ... The page_ref_freeze()/ folio_ref_freeze() should > also block concurrent GUP-fast very reliably. > > But if the folio is mapped into multiple page tables, we could see > concurrent zapping of the folio, a pagecache folios could get mapped/ > accessed concurrent, we could see fork() sharing the page in another > process, GUP ... trying to adjust the folio refcount while we froze it. > Very bad. Hmmm. Why is that not then a problem for, eg, splitting or migrating? Is it because they unmap first and then try to freeze? > For anonymous folios, it would likely be sufficient to check that > folio_mapcount() == 1. For pagecache folios, that's insufficient, likely > we would have to lock the pagecache. To handle folios mapped into > multiple page tables, we would have to do what > split_huge_page_to_list_to_order() does (temporary migration entries). > > So it's a bit more involved, and I'll have to leave that to s390x folks to > figure out. There are othe reasonable cleanups I think, but I'll have to > focus on other stuff. > > Compile tested, but not runtime tested, I'll appreiate some testing help > from people with UV access and experience. > > Cc: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: Heiko Carstens <hca@linux.ibm.com> > Cc: Vasily Gorbik <gor@linux.ibm.com> > Cc: Alexander Gordeev <agordeev@linux.ibm.com> > Cc: Christian Borntraeger <borntraeger@linux.ibm.com> > Cc: Sven Schnelle <svens@linux.ibm.com> > Cc: Janosch Frank <frankja@linux.ibm.com> > Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> > Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: Thomas Huth <thuth@redhat.com> > > David Hildenbrand (5): > s390/uv: don't call wait_on_page_writeback() without a reference > s390/uv: convert gmap_make_secure() to work on folios > s390/uv: convert PG_arch_1 users to only work on small folios > s390/uv: update PG_arch_1 comment > s390/hugetlb: convert PG_arch_1 code to work on folio->flags > > arch/s390/include/asm/page.h | 2 + > arch/s390/kernel/uv.c | 112 ++++++++++++++++++++++------------- > arch/s390/mm/gmap.c | 4 +- > arch/s390/mm/hugetlbpage.c | 8 +-- > 4 files changed, 79 insertions(+), 47 deletions(-) > > -- > 2.44.0 >
On 05.04.24 05:42, Matthew Wilcox wrote: > On Thu, Apr 04, 2024 at 06:36:37PM +0200, David Hildenbrand wrote: >> On my journey to remove page_mapcount(), I got hooked up on other folio >> cleanups that Willy most certainly will enjoy. >> >> This series removes the s390x usage of: >> * page_mapcount() [patches WIP] >> * page_has_private() [have patches to remove it] >> >> ... and makes PG_arch_1 only be set on folio->flags (i.e., never on tail >> pages of large folios). >> >> Further, one "easy" fix upfront. > > Looks like you didn't see: > > https://lore.kernel.org/linux-s390/20240322161149.2327518-1-willy@infradead.org/ Yes, I only skimmed linux-mm. I think s390x certainly wants to handle PTE-mapped THP in that code, I think there are ways to trigger that, we're just mostly lucky that it doesn't happen in the common case. But thinking about it, the current page_mapcount() based check could not possibly have worked for them and rejected any PTE-mapped THP. So I can just base my changes on top of yours (we might want to get the first fix in ahead of time). > >> ... unfortunately there is one other issue I spotted that I am not >> tackling in this series, because I am not 100% sure what we want to >> do: the usage of page_ref_freeze()/folio_ref_freeze() in >> make_folio_secure() is unsafe. :( >> >> In make_folio_secure(), we're holding the folio lock, the mmap lock and >> the PT lock. So we are protected against concurrent fork(), zap, GUP, >> swapin, migration ... The page_ref_freeze()/ folio_ref_freeze() should >> also block concurrent GUP-fast very reliably. >> >> But if the folio is mapped into multiple page tables, we could see >> concurrent zapping of the folio, a pagecache folios could get mapped/ >> accessed concurrent, we could see fork() sharing the page in another >> process, GUP ... trying to adjust the folio refcount while we froze it. >> Very bad. > > Hmmm. Why is that not then a problem for, eg, splitting or migrating? > Is it because they unmap first and then try to freeze? Yes, exactly. Using mapcount in combination with ref freezing is problematic. Except maybe for anonymous folios with mapcount=1, while holding a bunch of locks to stop anybody from stumbling over that.