Message ID | 20250408095222.860601-3-kevin.brodsky@arm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Always call constructor for kernel page tables | expand |
On 4/8/25 02:52, Kevin Brodsky wrote: > Page table pages are normally freed using the appropriate helper for > the given page table level. On x86, pud_free_pmd_page() and > pmd_free_pte_page() are an exception to the rule: they call > free_page() directly. > > Constructor/destructor calls are about to be introduced for kernel > PTEs. To avoid missing dtor calls in those helpers, free the PTE > pages using pte_free_kernel() instead of free_page(). > > While at it also use pmd_free() instead of calling pagetable_dtor() > explicitly at the PMD level. Looks sane and adding consistency is nice. Are there any tests for folio_test_pgtable() at free_page() time? If we had that, it would make it less likely that another free_page() user could sneak in without calling the destructor. Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote: > Are there any tests for folio_test_pgtable() at free_page() time? If we > had that, it would make it less likely that another free_page() user > could sneak in without calling the destructor. It's hidden, but yes: static inline bool page_expected_state(struct page *page, unsigned long check_flags) { if (unlikely(atomic_read(&page->_mapcount) != -1)) return false; PageTable uses page_type which aliases with mapcount, so this check covers "PageTable is still set when the last refcount to it is put". I don't think we really use the page refcount when allocating/freeing page tables. Anyone want to try switching it over to using alloc_frozen_pages() / free_frozen_pages()? Might need to move that API out of mm/internal.h ...
On 4/8/25 09:37, Matthew Wilcox wrote: > On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote: >> Are there any tests for folio_test_pgtable() at free_page() time? If we >> had that, it would make it less likely that another free_page() user >> could sneak in without calling the destructor. > It's hidden, but yes: > > static inline bool page_expected_state(struct page *page, > unsigned long check_flags) > { > if (unlikely(atomic_read(&page->_mapcount) != -1)) > return false; > > PageTable uses page_type which aliases with mapcount, so this check > covers "PageTable is still set when the last refcount to it is put". Huh, so shouldn't we have ended up in bad_page() for these, other than: pagetable_dtor(virt_to_ptdesc(pmd)); free_page((unsigned long)pmd); ?
On Tue, Apr 08, 2025 at 09:54:42AM -0700, Dave Hansen wrote: > On 4/8/25 09:37, Matthew Wilcox wrote: > > On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote: > >> Are there any tests for folio_test_pgtable() at free_page() time? If we > >> had that, it would make it less likely that another free_page() user > >> could sneak in without calling the destructor. > > It's hidden, but yes: > > > > static inline bool page_expected_state(struct page *page, > > unsigned long check_flags) > > { > > if (unlikely(atomic_read(&page->_mapcount) != -1)) > > return false; > > > > PageTable uses page_type which aliases with mapcount, so this check > > covers "PageTable is still set when the last refcount to it is put". > > Huh, so shouldn't we have ended up in bad_page() for these, other than: > > pagetable_dtor(virt_to_ptdesc(pmd)); > free_page((unsigned long)pmd); I think at this point in Kevin's series, we don't call the ctor for these pages, so we never set PageTable() on them. I could be wrong; as Kevin says, this is all very twisty and confusing with exceptions and exceptions to exceptions. This series should reduce the confusion.
On 4/8/25 10:40, Matthew Wilcox wrote: > I think at this point in Kevin's series, we don't call the ctor for > these pages, so we never set PageTable() on them. I could be wrong; > as Kevin says, this is all very twisty and confusing with exceptions and > exceptions to exceptions. This series should reduce the confusion. Oh, for sure. I didn't mean to detract from the series as a whole. It totally looks like a good idea!
On 08/04/2025 19:40, Matthew Wilcox wrote: > On Tue, Apr 08, 2025 at 09:54:42AM -0700, Dave Hansen wrote: >> On 4/8/25 09:37, Matthew Wilcox wrote: >>> On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote: >>>> Are there any tests for folio_test_pgtable() at free_page() time? If we >>>> had that, it would make it less likely that another free_page() user >>>> could sneak in without calling the destructor. >>> It's hidden, but yes: >>> >>> static inline bool page_expected_state(struct page *page, >>> unsigned long check_flags) >>> { >>> if (unlikely(atomic_read(&page->_mapcount) != -1)) >>> return false; >>> >>> PageTable uses page_type which aliases with mapcount, so this check >>> covers "PageTable is still set when the last refcount to it is put". >> Huh, so shouldn't we have ended up in bad_page() for these, other than: >> >> pagetable_dtor(virt_to_ptdesc(pmd)); >> free_page((unsigned long)pmd); > I think at this point in Kevin's series, we don't call the ctor for > these pages, so we never set PageTable() on them. I could be wrong; Correct, that's why I added this patch early in the series (the next patch adds the ctor call in pte_alloc_one_kernel()). The BUG() in v1 was indeed triggered by a page_expected_state() check [1]. > as Kevin says, this is all very twisty and confusing with exceptions and > exceptions to exceptions. This series should reduce the confusion. I hope so! - Kevin [1] https://lore.kernel.org/oe-lkp/202503211612.e11bd73f-lkp@intel.com/
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 7930f234c5f6..1dee9bdbeea5 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -809,14 +809,13 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr) for (i = 0; i < PTRS_PER_PMD; i++) { if (!pmd_none(pmd_sv[i])) { pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]); - free_page((unsigned long)pte); + pte_free_kernel(&init_mm, pte); } } free_page((unsigned long)pmd_sv); - pagetable_dtor(virt_to_ptdesc(pmd)); - free_page((unsigned long)pmd); + pmd_free(&init_mm, pmd); return 1; } @@ -839,7 +838,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) /* INVLPG to clear all paging-structure caches */ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1); - free_page((unsigned long)pte); + pte_free_kernel(&init_mm, pte); return 1; }
Page table pages are normally freed using the appropriate helper for the given page table level. On x86, pud_free_pmd_page() and pmd_free_pte_page() are an exception to the rule: they call free_page() directly. Constructor/destructor calls are about to be introduced for kernel PTEs. To avoid missing dtor calls in those helpers, free the PTE pages using pte_free_kernel() instead of free_page(). While at it also use pmd_free() instead of calling pagetable_dtor() explicitly at the PMD level. Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> --- arch/x86/mm/pgtable.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)