diff mbox series

[v2,02/12] x86: pgtable: Always use pte_free_kernel()

Message ID 20250408095222.860601-3-kevin.brodsky@arm.com (mailing list archive)
State New
Headers show
Series Always call constructor for kernel page tables | expand

Commit Message

Kevin Brodsky April 8, 2025, 9:52 a.m. UTC
Page table pages are normally freed using the appropriate helper for
the given page table level. On x86, pud_free_pmd_page() and
pmd_free_pte_page() are an exception to the rule: they call
free_page() directly.

Constructor/destructor calls are about to be introduced for kernel
PTEs. To avoid missing dtor calls in those helpers, free the PTE
pages using pte_free_kernel() instead of free_page().

While at it also use pmd_free() instead of calling pagetable_dtor()
explicitly at the PMD level.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/x86/mm/pgtable.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Dave Hansen April 8, 2025, 3:22 p.m. UTC | #1
On 4/8/25 02:52, Kevin Brodsky wrote:
> Page table pages are normally freed using the appropriate helper for
> the given page table level. On x86, pud_free_pmd_page() and
> pmd_free_pte_page() are an exception to the rule: they call
> free_page() directly.
> 
> Constructor/destructor calls are about to be introduced for kernel
> PTEs. To avoid missing dtor calls in those helpers, free the PTE
> pages using pte_free_kernel() instead of free_page().
> 
> While at it also use pmd_free() instead of calling pagetable_dtor()
> explicitly at the PMD level.

Looks sane and adding consistency is nice.

Are there any tests for folio_test_pgtable() at free_page() time? If we
had that, it would make it less likely that another free_page() user
could sneak in without calling the destructor.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Matthew Wilcox April 8, 2025, 4:37 p.m. UTC | #2
On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote:
> Are there any tests for folio_test_pgtable() at free_page() time? If we
> had that, it would make it less likely that another free_page() user
> could sneak in without calling the destructor.

It's hidden, but yes:

static inline bool page_expected_state(struct page *page,
                                        unsigned long check_flags)
{
        if (unlikely(atomic_read(&page->_mapcount) != -1))
                return false;

PageTable uses page_type which aliases with mapcount, so this check
covers "PageTable is still set when the last refcount to it is put".

I don't think we really use the page refcount when allocating/freeing
page tables.  Anyone want to try switching it over to using
alloc_frozen_pages() / free_frozen_pages()?  Might need to move that API
out of mm/internal.h ...
Dave Hansen April 8, 2025, 4:54 p.m. UTC | #3
On 4/8/25 09:37, Matthew Wilcox wrote:
> On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote:
>> Are there any tests for folio_test_pgtable() at free_page() time? If we
>> had that, it would make it less likely that another free_page() user
>> could sneak in without calling the destructor.
> It's hidden, but yes:
> 
> static inline bool page_expected_state(struct page *page,
>                                         unsigned long check_flags)
> {
>         if (unlikely(atomic_read(&page->_mapcount) != -1))
>                 return false;
> 
> PageTable uses page_type which aliases with mapcount, so this check
> covers "PageTable is still set when the last refcount to it is put".

Huh, so shouldn't we have ended up in bad_page() for these, other than:

        pagetable_dtor(virt_to_ptdesc(pmd));
        free_page((unsigned long)pmd);

?
Matthew Wilcox April 8, 2025, 5:40 p.m. UTC | #4
On Tue, Apr 08, 2025 at 09:54:42AM -0700, Dave Hansen wrote:
> On 4/8/25 09:37, Matthew Wilcox wrote:
> > On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote:
> >> Are there any tests for folio_test_pgtable() at free_page() time? If we
> >> had that, it would make it less likely that another free_page() user
> >> could sneak in without calling the destructor.
> > It's hidden, but yes:
> > 
> > static inline bool page_expected_state(struct page *page,
> >                                         unsigned long check_flags)
> > {
> >         if (unlikely(atomic_read(&page->_mapcount) != -1))
> >                 return false;
> > 
> > PageTable uses page_type which aliases with mapcount, so this check
> > covers "PageTable is still set when the last refcount to it is put".
> 
> Huh, so shouldn't we have ended up in bad_page() for these, other than:
> 
>         pagetable_dtor(virt_to_ptdesc(pmd));
>         free_page((unsigned long)pmd);

I think at this point in Kevin's series, we don't call the ctor for
these pages, so we never set PageTable() on them.  I could be wrong;
as Kevin says, this is all very twisty and confusing with exceptions and
exceptions to exceptions.  This series should reduce the confusion.
Dave Hansen April 8, 2025, 5:42 p.m. UTC | #5
On 4/8/25 10:40, Matthew Wilcox wrote:
> I think at this point in Kevin's series, we don't call the ctor for
> these pages, so we never set PageTable() on them.  I could be wrong;
> as Kevin says, this is all very twisty and confusing with exceptions and
> exceptions to exceptions.  This series should reduce the confusion.

Oh, for sure. I didn't mean to detract from the series as a whole. It
totally looks like a good idea!
Kevin Brodsky April 9, 2025, 2:50 p.m. UTC | #6
On 08/04/2025 19:40, Matthew Wilcox wrote:
> On Tue, Apr 08, 2025 at 09:54:42AM -0700, Dave Hansen wrote:
>> On 4/8/25 09:37, Matthew Wilcox wrote:
>>> On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote:
>>>> Are there any tests for folio_test_pgtable() at free_page() time? If we
>>>> had that, it would make it less likely that another free_page() user
>>>> could sneak in without calling the destructor.
>>> It's hidden, but yes:
>>>
>>> static inline bool page_expected_state(struct page *page,
>>>                                         unsigned long check_flags)
>>> {
>>>         if (unlikely(atomic_read(&page->_mapcount) != -1))
>>>                 return false;
>>>
>>> PageTable uses page_type which aliases with mapcount, so this check
>>> covers "PageTable is still set when the last refcount to it is put".
>> Huh, so shouldn't we have ended up in bad_page() for these, other than:
>>
>>         pagetable_dtor(virt_to_ptdesc(pmd));
>>         free_page((unsigned long)pmd);
> I think at this point in Kevin's series, we don't call the ctor for
> these pages, so we never set PageTable() on them. I could be wrong;

Correct, that's why I added this patch early in the series (the next
patch adds the ctor call in pte_alloc_one_kernel()).

The BUG() in v1 was indeed triggered by a page_expected_state() check [1].

> as Kevin says, this is all very twisty and confusing with exceptions and
> exceptions to exceptions.  This series should reduce the confusion.

I hope so!

- Kevin

[1] https://lore.kernel.org/oe-lkp/202503211612.e11bd73f-lkp@intel.com/
diff mbox series

Patch

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 7930f234c5f6..1dee9bdbeea5 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -809,14 +809,13 @@  int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 	for (i = 0; i < PTRS_PER_PMD; i++) {
 		if (!pmd_none(pmd_sv[i])) {
 			pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
-			free_page((unsigned long)pte);
+			pte_free_kernel(&init_mm, pte);
 		}
 	}
 
 	free_page((unsigned long)pmd_sv);
 
-	pagetable_dtor(virt_to_ptdesc(pmd));
-	free_page((unsigned long)pmd);
+	pmd_free(&init_mm, pmd);
 
 	return 1;
 }
@@ -839,7 +838,7 @@  int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 	/* INVLPG to clear all paging-structure caches */
 	flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
 
-	free_page((unsigned long)pte);
+	pte_free_kernel(&init_mm, pte);
 
 	return 1;
 }