mbox series

[00/10] Account page tables at all levels

Message ID 20241219164425.2277022-1-kevin.brodsky@arm.com (mailing list archive)
Headers show
Series Account page tables at all levels | expand

Message

Kevin Brodsky Dec. 19, 2024, 4:44 p.m. UTC
We currently have a pair of ctor/dtor calls for lower page table levels,
up to PUD. At PTE and PMD level, these handle split locks,
if supported. Additionally, the helpers ensure correct accounting of
page table pages to the corresponding process.

This series takes that principle to its logical conclusion: account all
page table pages, at all levels and on all architectures (see caveat
below), through suitable ctor/dtor calls. This means concretely:

* Ensuring that the existing pagetable_{pte,pmd,pud}_[cd]tor are called
  on all architectures.

* Introduce pagetable_{p4d,pgd}_[cd]tor and call them at P4D/PGD level.

The primary motivation for this series is not page accounting, though.
P4D/PGD-level pages represent a tiny proportion of the memory used by a
process. Rather, the appeal comes from the introduction of a single,
generic place where construction/destruction hooks can be called for all
page table pages at all levels. This will come in handy for protecting
page tables using kpkeys [1]. Peter Zijlstra suggested this approach [2]
to avoid handling this in arch code.

With this series, __pagetable_ctor() and __pagetable_dtor() (introduced
in patch 1) should be called when page tables are allocated/freed at any
level on any architecture. Note however that only P*D that consist of
one or more regular pages are handled. This excludes:

* All P*D allocated from a kmem_cache (or kmalloc).
* P*D that are not allocated via GFP (only an issue on sparc).

The table at the end of this email gives more details for each
architecture.

Patches in details:

* Patch 1 factors out the common implementation of all
  pagetable_*_[cd]tor.

* Patch 2-4: PMD/PUD; add missing calls to pagetable_{pmd,pud}_[cd]tor
  on various architectures.

* Patch 5-7: P4D; move most arch to using generic alloc/free functions
  at P4D level, and then have them call pagetable_p4d_[cd]tor.

* Patch 8-10: PGD; same principle at PGD level.

The patches were build-tested on all architectures (thanks Linus Walleij
for triggering the LKP CI for me!), and boot-tested on arm64 and x86_64.

- Kevin

[1] https://lore.kernel.org/linux-hardening/20241206101110.1646108-1-kevin.brodsky@arm.com/
[2] https://lore.kernel.org/linux-hardening/20241210122355.GN8562@noisy.programming.kicks-ass.net/
---

Overview of the situation on all arch after this series is applied:

  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | arch          | #include                | Complete ctor/dtor    | ctor/dtor    | Notes                              |
  |               | <asm-generic/pgalloc.h> | calls up to p4d level | at pgd level |                                    |
  +===============+=========================+=======================+==============+====================================+
  | alpha         | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | arc           | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | arm           | Y                       | Y                     | Y/N          | kmalloc at pgd level if LPAE       |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | arm64         | Y                       | Y                     | Y/N          | kmem_cache if pgd not page-sized   |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | csky          | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | hexagon       | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | loongarch     | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | m68k (Sun3)   | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | m68k (others) | N                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | microblaze    | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | mips          | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | nios2         | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | openrisc      | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | parisc        | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | powerpc       | N                       | Y/N                   | N            | kmem_cache at:                     |
  |               |                         |                       |              | - pgd level                        |
  |               |                         |                       |              | - pud level in 64-bit              |
  |               |                         |                       |              | - pmd level in 64-bit on !book3s   |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | riscv         | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | s390          | N                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | sh            | Y                       | N                     | N            | kmem_cache at pmd/pgd level        |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | sparc         | N                       | N                     | N            | 32-bit: special memory             |
  |               |                         |                       |              | 64-bit: kmem_cache above pte level |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | um            | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | x86           | Y                       | Y                     | Y/N          | kmem_cache at pgd level if PAE     |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+
  | xtensa        | Y                       | Y                     | Y            |                                    |
  +---------------+-------------------------+-----------------------+--------------+------------------------------------+

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-csky@vger.kernel.org
Cc: linux-hexagon@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mips@vger.kernel.org
Cc: linux-openrisc@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-um@lists.infradead.org
Cc: loongarch@lists.linux.dev
Cc: x86@kernel.org
---
Kevin Brodsky (10):
  mm: Move common parts of pagetable_*_[cd]tor to helpers
  parisc: mm: Ensure pagetable_pmd_[cd]tor are called
  m68k: mm: Add calls to pagetable_pmd_[cd]tor
  s390/mm: Add calls to pagetable_pud_[cd]tor
  riscv: mm: Skip pgtable level check in {pud,p4d}_alloc_one
  asm-generic: pgalloc: Provide generic p4d_{alloc_one,free}
  mm: Introduce ctor/dtor at P4D level
  ARM: mm: Rename PGD helpers
  asm-generic: pgalloc: Provide generic __pgd_{alloc,free}
  mm: Introduce ctor/dtor at PGD level

 arch/alpha/mm/init.c                     |  2 +-
 arch/arc/include/asm/pgalloc.h           |  9 +--
 arch/arm/mm/pgd.c                        | 16 +++--
 arch/arm64/include/asm/pgalloc.h         | 17 ------
 arch/arm64/mm/pgd.c                      |  4 +-
 arch/csky/include/asm/pgalloc.h          |  2 +-
 arch/hexagon/include/asm/pgalloc.h       |  2 +-
 arch/loongarch/mm/pgtable.c              |  7 +--
 arch/m68k/include/asm/mcf_pgalloc.h      |  2 +
 arch/m68k/include/asm/motorola_pgalloc.h |  6 +-
 arch/m68k/include/asm/sun3_pgalloc.h     |  2 +-
 arch/m68k/mm/motorola.c                  | 31 ++++++++--
 arch/microblaze/include/asm/pgalloc.h    |  7 +--
 arch/mips/include/asm/pgalloc.h          |  6 --
 arch/mips/mm/pgtable.c                   |  8 +--
 arch/nios2/mm/pgtable.c                  |  3 +-
 arch/openrisc/include/asm/pgalloc.h      |  6 +-
 arch/parisc/include/asm/pgalloc.h        | 39 ++++--------
 arch/riscv/include/asm/pgalloc.h         | 46 ++------------
 arch/s390/include/asm/pgalloc.h          | 33 +++++++---
 arch/um/kernel/mem.c                     |  7 +--
 arch/x86/include/asm/pgalloc.h           | 18 ------
 arch/x86/mm/pgtable.c                    | 27 +++++----
 arch/xtensa/include/asm/pgalloc.h        |  2 +-
 include/asm-generic/pgalloc.h            | 76 +++++++++++++++++++++++-
 include/linux/mm.h                       | 64 +++++++++++++-------
 26 files changed, 234 insertions(+), 208 deletions(-)


base-commit: 78d4f34e2115b517bcbfe7ec0d018bbbb6f9b0b8

Comments

Dave Hansen Dec. 19, 2024, 5:13 p.m. UTC | #1
On 12/19/24 08:44, Kevin Brodsky wrote:
>   +---------------+-------------------------+-----------------------+--------------+------------------------------------+
>   | x86           | Y                       | Y                     | Y/N          | kmem_cache at pgd level if PAE     |
>   +---------------+-------------------------+-----------------------+--------------+------------------------------------+

This is a really rare series that adds functionality _and_ removes code
overall. It looks really good to me. The x86 implementation seems to be
captured just fine in the generic one:

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

One super tiny nit is that the PAE pgd _can_ be allocated using
__get_free_pages(). It was originally there for Xen, but I think it's
being used for PTI only at this point and the comments are wrong-ish.

I kinda think we should just get rid of the 32-bit kmem_cache entirely.
Kevin Brodsky Dec. 20, 2024, 10:58 a.m. UTC | #2
On 19/12/2024 18:13, Dave Hansen wrote:
> On 12/19/24 08:44, Kevin Brodsky wrote:
>>   +---------------+-------------------------+-----------------------+--------------+------------------------------------+
>>   | x86           | Y                       | Y                     | Y/N          | kmem_cache at pgd level if PAE     |
>>   +---------------+-------------------------+-----------------------+--------------+------------------------------------+
> This is a really rare series that adds functionality _and_ removes code
> overall. It looks really good to me. The x86 implementation seems to be
> captured just fine in the generic one:

Thank you for the review, very appreciated!

> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

Just to double-check, are your ack'ing the x86 changes specifically? If
so I'll add your Acked-by on patch 6, 7 and 9.

> One super tiny nit is that the PAE pgd _can_ be allocated using
> __get_free_pages(). It was originally there for Xen, but I think it's
> being used for PTI only at this point and the comments are wrong-ish.
>
> I kinda think we should just get rid of the 32-bit kmem_cache entirely.

That would certainly simplify things on the x86 side! I'm not at all
familiar with that code though, would you be happy with providing a
patch? I could add it to this series if that's convenient.

- Kevin
Dave Hansen Dec. 20, 2024, 2:45 p.m. UTC | #3
On 12/20/24 02:58, Kevin Brodsky wrote:
>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> Just to double-check, are your ack'ing the x86 changes specifically? If
> so I'll add your Acked-by on patch 6, 7 and 9.

Feel free to add it to each patch in the series.
Dave Hansen Dec. 20, 2024, 7:31 p.m. UTC | #4
On 12/20/24 02:58, Kevin Brodsky wrote:
>> One super tiny nit is that the PAE pgd _can_ be allocated using
>> __get_free_pages(). It was originally there for Xen, but I think it's
>> being used for PTI only at this point and the comments are wrong-ish.
>>
>> I kinda think we should just get rid of the 32-bit kmem_cache entirely.
> That would certainly simplify things on the x86 side! I'm not at all
> familiar with that code though, would you be happy with providing a
> patch? I could add it to this series if that's convenient.

I hacked this together yesterday:

> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/log/?h=simplify-pae-20241220
It definitely needs some more work. I'm particularly still puzzling
about why SHARED_KERNEL_PMD is used both as a trigger for 32b vs.
PAGE_SIZE PAE pgd allocations _and_ for the actual PMD sharing.

Xen definitely needed the whole page behavior but I'm not sure why PTI did.

Either way, that series should make the PAE PGDs a _bit_ less weird at
the cost of an extra ~2 pages per process for folks who are running
32-bit PAE kernels with PTI disabled.

But I think the diffstat is worth it:

 5 files changed, 16 insertions(+), 96 deletions(-)