diff mbox series

[v2] ptdump: add non-leaf descriptor support

Message ID 20240423142307.495726312-1-mbland@motorola.com (mailing list archive)
State New
Headers show
Series [v2] ptdump: add non-leaf descriptor support | expand

Commit Message

Maxwell Bland April 23, 2024, 7:23 p.m. UTC
Add an optional note_non_leaf parameter to ptdump, causing note_page to
be called on non-leaf descriptors. Implement this functionality on arm64
by printing table descriptors along with table-specific permission sets.

For arm64, break (1) the uniform number of columns for each descriptor,
and (2) the coalescing of large PTE regions, which are now split up by
PMD. This is a "good" thing since it makes the behavior and protection
bits set on page tables, such as PXNTable, more explicit.

Examples (spaces and last attribute condensed)
Before:
0xffff008440210000-0xffff008440400000 1984K PTE ro NX SHD AF NG UXN M...
0xffff008440400000-0xffff008441c00000 24M PMD ro NX SHD AF NG BLK UXN M...
0xffff008441c00000-0xffff008441dc0000 1792K PTE ro NX SHD AF NG UXN M...
0xffff008441dc0000-0xffff00844317b000 20204K PTE RW NX SHD AF NG UXN M...

After:
0xffff0fb640200000-0xffff0fb640400000 2M PMD TBL RW x NXTbl UXNTbl M...
0xffff0fb640200000-0xffff0fb640210000 64K PTE RW NX SHD AF NG UXN M...
0xffff0fb640210000-0xffff0fb640400000 1984K PTE ro NX SHD AF NG UXN M...
0xffff0fb640400000-0xffff0fb641c00000 24M PMD BLK ro SHD AF NG NX UXN ...
0xffff0fb641c00000-0xffff0fb641e00000 2M PMD TBL RW x NXTbl UXNTbl M...
0xffff0fb641c00000-0xffff0fb641dc0000 1792K PTE ro NX SHD AF NG UXN M...
0xffff0fb641dc0000-0xffff0fb641e00000 256K PTE RW NX SHD AF NG UXN ME...

Full dumps available at
github.com/maxwell-bland/linux-patch-data/tree/main/ptdump-non-leaf

Signed-off-by: Maxwell Bland <mbland@motorola.com>
---

Dear Andrew,

> I was going to queue this while awaiting acks from arm people, but
> there's a large reject in Documentation/arch/arm64/ptdump.rst.

Ack, thank you and apologies, if I understand correctly, you are seeing
this issue on linux-next/akpm, I was not familiar with the submission
process. I was not able to reproduce on mm-unstable, linux-next/master,
mm/master, ... This reply (v2 commit) is cherry-picked to
linux-next/akpm.

A diff with linux-next/master for my original submission only returns:

  611c611                                                   
  < base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8   
  > base-commit: 7d4768ae56014b3db93423e84f8794f173ec5c91   

Regards,
Maxwell Bland

 Documentation/arch/arm64/ptdump.rst | 125 ++++++++++++++++
 arch/arm64/mm/ptdump.c              | 224 +++++++++++++++++++++++++---
 include/linux/ptdump.h              |   1 +
 mm/ptdump.c                         |  13 ++
 4 files changed, 343 insertions(+), 20 deletions(-)
 create mode 100644 Documentation/arch/arm64/ptdump.rst


base-commit: 5f9df76887bf8170e8844f1907c13fbbb30e9c36

Comments

Maxwell Bland April 24, 2024, 10:43 p.m. UTC | #1
On Wed, 24 Apr 2024 14:47:52 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:

> That seems pretty broken. Please check your mailer setup.

It was this, thank you for noticing! I am sorry for the inconvenience.
The mailer also ate newlines from the original patch, hence the issue.

To other maintainers, note this patch will not apply.
I apologize for the inconvenience and the multiple messages.
I will fix the SMTP and send correctly formatted versions next week.

Regards,
Maxwell
Alexandre Ghiti April 25, 2024, 10:22 a.m. UTC | #2
Hi Maxwell,

On Tue, Apr 23, 2024 at 9:26 PM Maxwell Bland <mbland@motorola.com> wrote:
>
> Add an optional note_non_leaf parameter to ptdump, causing note_page to
> be called on non-leaf descriptors. Implement this functionality on arm64
> by printing table descriptors along with table-specific permission sets.
>
> For arm64, break (1) the uniform number of columns for each descriptor,
> and (2) the coalescing of large PTE regions, which are now split up by
> PMD. This is a "good" thing since it makes the behavior and protection
> bits set on page tables, such as PXNTable, more explicit.
>
> Examples (spaces and last attribute condensed)
> Before:
> 0xffff008440210000-0xffff008440400000 1984K PTE ro NX SHD AF NG UXN M...
> 0xffff008440400000-0xffff008441c00000 24M PMD ro NX SHD AF NG BLK UXN M...
> 0xffff008441c00000-0xffff008441dc0000 1792K PTE ro NX SHD AF NG UXN M...
> 0xffff008441dc0000-0xffff00844317b000 20204K PTE RW NX SHD AF NG UXN M...
>
> After:
> 0xffff0fb640200000-0xffff0fb640400000 2M PMD TBL RW x NXTbl UXNTbl M...
> 0xffff0fb640200000-0xffff0fb640210000 64K PTE RW NX SHD AF NG UXN M...
> 0xffff0fb640210000-0xffff0fb640400000 1984K PTE ro NX SHD AF NG UXN M...
> 0xffff0fb640400000-0xffff0fb641c00000 24M PMD BLK ro SHD AF NG NX UXN ...
> 0xffff0fb641c00000-0xffff0fb641e00000 2M PMD TBL RW x NXTbl UXNTbl M...
> 0xffff0fb641c00000-0xffff0fb641dc0000 1792K PTE ro NX SHD AF NG UXN M...
> 0xffff0fb641dc0000-0xffff0fb641e00000 256K PTE RW NX SHD AF NG UXN ME...

I think it would be easier to read if the lower levels were tabulated,
so that we can quickly see the page table structure.

I'll implement this on riscv once merged, I'm a big user of this dump :)

Thanks,

Alex

>
> Full dumps available at
> github.com/maxwell-bland/linux-patch-data/tree/main/ptdump-non-leaf
>
> Signed-off-by: Maxwell Bland <mbland@motorola.com>
> ---
>
> Dear Andrew,
>
> > I was going to queue this while awaiting acks from arm people, but
> > there's a large reject in Documentation/arch/arm64/ptdump.rst.
>
> Ack, thank you and apologies, if I understand correctly, you are seeing
> this issue on linux-next/akpm, I was not familiar with the submission
> process. I was not able to reproduce on mm-unstable, linux-next/master,
> mm/master, ... This reply (v2 commit) is cherry-picked to
> linux-next/akpm.
>
> A diff with linux-next/master for my original submission only returns:
>
>   611c611
>   < base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8
>   > base-commit: 7d4768ae56014b3db93423e84f8794f173ec5c91
>
> Regards,
> Maxwell Bland
>
>  Documentation/arch/arm64/ptdump.rst | 125 ++++++++++++++++
>  arch/arm64/mm/ptdump.c              | 224 +++++++++++++++++++++++++---
>  include/linux/ptdump.h              |   1 +
>  mm/ptdump.c                         |  13 ++
>  4 files changed, 343 insertions(+), 20 deletions(-)
>  create mode 100644 Documentation/arch/arm64/ptdump.rst
>
> diff --git a/Documentation/arch/arm64/ptdump.rst b/Documentation/arch/arm64/ptdump.rst
> new file mode 100644
> index 000000000000..0f38b92fd839
> --- /dev/null
> +++ b/Documentation/arch/arm64/ptdump.rst
> @@ -0,0 +1,125 @@
> +======================
> +Kernel page table dump
> +======================
> +
> +ptdump is a debugfs interface that provides a detailed dump of the kernel page
> +tables. It offers a comprehensive overview of the kernel virtual memory layout
> +as well as the attributes associated with the various regions in a
> +human-readable format. It is useful to dump the kernel page tables to verify
> +permissions and memory types. Examining the page table entries and permissions
> +helps identify potential security vulnerabilities such as mappings with overly
> +permissive access rights or improper memory protections.
> +
> +Memory hotplug allows dynamic expansion or contraction of available memory
> +without requiring a system reboot. To maintain the consistency and integrity of
> +the memory management data structures, arm64 makes use of the
> +mem_hotplug_lock semaphore in write mode. Additionally, in read mode,
> +mem_hotplug_lock supports an efficient implementation of
> +get_online_mems() and put_online_mems(). These protect the offlining of
> +memory being accessed by the ptdump code.
> +
> +In order to dump the kernel page tables, enable the following configurations
> +and mount debugfs::
> +
> + CONFIG_GENERIC_PTDUMP=y
> + CONFIG_PTDUMP_CORE=y
> + CONFIG_PTDUMP_DEBUGFS=y
> +
> + mount -t debugfs nodev /sys/kernel/debug
> + cat /sys/kernel/debug/kernel_page_tables
> +
> +On analysing the output of cat /sys/kernel/debug/kernel_page_tables one can
> +derive information about the virtual address range of a contiguous group of
> +page table entries, followed by size of the memory region covered by this
> +group, the hierarchical structure of the page tables and finally the attributes
> +associated with each page in the group. Groups are broken up either according
> +to a change in attributes or by parent descriptor, such as a PMD. Note that the
> +set of attributes, and therefore formatting, is not equivalent between entry
> +types. For example, PMD entries have a separate set of attributes from leaf
> +level PTE entries, because they support both the UXNTable and PXNTable
> +permission bits.
> +
> +The page attributes provide information about access permissions, execution
> +capability, type of mapping such as leaf level PTE or block level PGD, PMD and
> +PUD, and access status of a page within the kernel memory. Non-PTE block or
> +page level entries are denoted with either "BLK" or "TBL", respectively.
> +Assessing these attributes can assist in understanding the memory layout,
> +access patterns and security characteristics of the kernel pages.
> +
> +Kernel virtual memory layout example::
> +
> + start address        end address         size type  leaf    attributes
> + +-----------------------------------------------------------------------------------------------------------------+
> + | ---[ Linear Mapping start ]---                                                                                  |
> + | ...                                                                                                             |
> + | 0xffff0d02c3200000-0xffff0d02c3400000    2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL    |
> + | 0xffff0d02c3200000-0xffff0d02c3218000   96K PTE           ro NX SHD AF NG     UXN    MEM/NORMAL-TAGGED          |
> + | 0xffff0d02c3218000-0xffff0d02c3250000  224K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL-TAGGED          |
> + | 0xffff0d02c3250000-0xffff0d02c33b3000 1420K PTE           ro NX SHD AF NG     UXN    MEM/NORMAL-TAGGED          |
> + | 0xffff0d02c33b3000-0xffff0d02c3400000  308K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL-TAGGED          |
> + | 0xffff0d02c3400000-0xffff0d02c3600000    2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL    |
> + | 0xffff0d02c3400000-0xffff0d02c3600000    2M PTE           RW NX SHD AF NG     UXN    MEM/NORMAL-TAGGED          |
> + | ...                                                                                                             |
> + | 0xffff0d02c3200000-0xffff0d02c3400000    2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL    |
> + | ...                                                                                                             |
> + | ---[ Linear Mapping end ]---                                                                                    |
> + +-----------------------------------------------------------------------------------------------------------------+
> + | ---[ Modules start ]---                                                                                         |
> + | ...                                                                                                             |
> + | 0xffff800000000000-0xffff800000000080 128B PGD   TBL     RW               x     UXNTbl    MEM/NORMAL            |
> + | 0xffff800000000000-0xffff800080000000   2G PUD F BLK     RW               x               MEM/NORMAL            |
> + | ...                                                                                                             |
> + | ---[ Modules end ]---                                                                                           |
> + +-----------------------------------------------------------------------------------------------------------------+
> + | ---[ vmalloc() area ]---                                                                                        |
> + | ...                                                                                                             |
> + | 0xffff800080000000-0xffff8000c0000000   1G PUD   TBL     RW               x     UXNTbl    MEM/NORMAL            |
> + | ...                                                                                                             |
> + | 0xffff800080200000-0xffff800080400000   2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL     |
> + | 0xffff800080200000-0xffff80008022f000 188K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL                  |
> + | 0xffff80008022f000-0xffff800080230000   4K PTE F BLK     RW x                       MEM/NORMAL                  |
> + | 0xffff800080230000-0xffff800080233000  12K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL                  |
> + | 0xffff800080233000-0xffff800080234000   4K PTE F BLK     RW x                       MEM/NORMAL                  |
> + | 0xffff800080234000-0xffff800080237000  12K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL                  |
> + | ...                                                                                                             |
> + | 0xffff800080400000-0xffff800084000000  60M PMD F BLK     RW               x      x     x         MEM/NORMAL     |
> + | ...                                                                                                             |
> + | ---[ vmalloc() end ]---                                                                                         |
> + +-----------------------------------------------------------------------------------------------------------------+
> + | ---[ vmemmap start ]---                                                                                         |
> + | ...                                                                                                             |
> + | 0xfffffe33cb000000-0xfffffe33cc000000  16M PMD   BLK     RW SHD AF NG     NX UXN x     x         MEM/NORMAL     |
> + | 0xfffffe33cc000000-0xfffffe3400000000 832M PMD F BLK     RW               x      x     x         MEM/NORMAL     |
> + | ...                                                                                                             |
> + | ---[ vmemmap end ]---                                                                                           |
> + +-----------------------------------------------------------------------------------------------------------------+
> + | ---[ PCI I/O start ]---                                                                                         |
> + | ...                                                                                                             |
> + | 0xffffffffc0800000-0xffffffffc0810000 64K PTE           RW NX SHD AF NG     UXN    DEVICE/nGnRE                 |
> + | ...                                                                                                             |
> + | ---[ PCI I/O end ]---                                                                                           |
> + +-----------------------------------------------------------------------------------------------------------------+
> + | ---[ Fixmap start ]---                                                                                          |
> + | ...                                                                                                             |
> + | 0xffffffffff5f6000-0xffffffffff5f9000 12K PTE           ro x  SHD AF        UXN    MEM/NORMAL                   |
> + | 0xffffffffff5f9000-0xffffffffff5fa000  4K PTE           ro NX SHD AF NG     UXN    MEM/NORMAL                   |
> + | ...                                                                                                             |
> + | ---[ Fixmap end ]---                                                                                            |
> + +-----------------------------------------------------------------------------------------------------------------+
> +
> +cat /sys/kernel/debug/kernel_page_tables output::
> +
> + 0xffff000000000000-0xffff0d0000000000   13T PGD F BLK     RW               x               MEM/NORMAL
> + 0xffff0d0000000000-0xffff0d0000000080  128B PGD   TBL     RW               NXTbl UXNTbl    MEM/NORMAL
> + 0xffff0d0000000000-0xffff0d02c0000000   11G PUD F BLK     RW               x               MEM/NORMAL
> + 0xffff0d02c0000000-0xffff0d0300000000    1G PUD   TBL     RW               NXTbl UXNTbl    MEM/NORMAL
> + 0xffff0d02c0000000-0xffff0d02c0200000    2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL
> + 0xffff0d02c0000000-0xffff0d02c0200000    2M PTE           RW NX SHD AF NG     UXN    MEM/NORMAL-TAGGED
> + 0xffff0d02c0200000-0xffff0d02c0400000    2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL
> + 0xffff0d02c0200000-0xffff0d02c0210000   64K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL-TAGGED
> + 0xffff0d02c0210000-0xffff0d02c0400000 1984K PTE           ro NX SHD AF NG     UXN    MEM/NORMAL
> + 0xffff0d02c0400000-0xffff0d02c1c00000   24M PMD   BLK     ro SHD AF NG     NX UXN x     x         MEM/NORMAL
> + 0xffff0d02c1c00000-0xffff0d02c1e00000    2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL
> + 0xffff0d02c1c00000-0xffff0d02c1dc0000 1792K PTE           ro NX SHD AF NG     UXN    MEM/NORMAL
> + 0xffff0d02c1dc0000-0xffff0d02c1e00000  256K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL-TAGGED
> +
> diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
> index 9bc4066c5bf3..6a8b2bcc9ac7 100644
> --- a/arch/arm64/mm/ptdump.c
> +++ b/arch/arm64/mm/ptdump.c
> @@ -24,6 +24,7 @@
>  #include <asm/memory.h>
>  #include <asm/pgtable-hwdef.h>
>  #include <asm/ptdump.h>
> +#include <asm/pgalloc.h>
>
>
>  enum address_markers_idx {
> @@ -97,6 +98,11 @@ static const struct prot_bits pte_bits[] = {
>                 .val    = PTE_VALID,
>                 .set    = " ",
>                 .clear  = "F",
> +       }, {
> +               .mask   = PTE_TABLE_BIT,
> +               .val    = PTE_TABLE_BIT,
> +               .set    = "   ",
> +               .clear  = "BLK",
>         }, {
>                 .mask   = PTE_USER,
>                 .val    = PTE_USER,
> @@ -132,11 +138,6 @@ static const struct prot_bits pte_bits[] = {
>                 .val    = PTE_CONT,
>                 .set    = "CON",
>                 .clear  = "   ",
> -       }, {
> -               .mask   = PTE_TABLE_BIT,
> -               .val    = PTE_TABLE_BIT,
> -               .set    = "   ",
> -               .clear  = "BLK",
>         }, {
>                 .mask   = PTE_UXN,
>                 .val    = PTE_UXN,
> @@ -170,34 +171,206 @@ static const struct prot_bits pte_bits[] = {
>         }
>  };
>
> +static const struct prot_bits pmd_bits[] = {
> +       {
> +               .mask   = PMD_SECT_VALID,
> +               .val    = PMD_SECT_VALID,
> +               .set    = " ",
> +               .clear  = "F",
> +       }, {
> +               .mask   = PMD_TABLE_BIT,
> +               .val    = PMD_TABLE_BIT,
> +               .set    = "TBL",
> +               .clear  = "BLK",
> +       }, {
> +               .mask   = PMD_SECT_USER,
> +               .val    = PMD_SECT_USER,
> +               .set    = "USR",
> +               .clear  = "   ",
> +       }, {
> +               .mask   = PMD_SECT_RDONLY,
> +               .val    = PMD_SECT_RDONLY,
> +               .set    = "ro",
> +               .clear  = "RW",
> +       }, {
> +               .mask   = PMD_SECT_S,
> +               .val    = PMD_SECT_S,
> +               .set    = "SHD",
> +               .clear  = "   ",
> +       }, {
> +               .mask   = PMD_SECT_AF,
> +               .val    = PMD_SECT_AF,
> +               .set    = "AF",
> +               .clear  = "  ",
> +       }, {
> +               .mask   = PMD_SECT_NG,
> +               .val    = PMD_SECT_NG,
> +               .set    = "NG",
> +               .clear  = "  ",
> +       }, {
> +               .mask   = PMD_SECT_CONT,
> +               .val    = PMD_SECT_CONT,
> +               .set    = "CON",
> +               .clear  = "   ",
> +       }, {
> +               .mask   = PMD_SECT_PXN,
> +               .val    = PMD_SECT_PXN,
> +               .set    = "NX",
> +               .clear  = "x ",
> +       }, {
> +               .mask   = PMD_SECT_UXN,
> +               .val    = PMD_SECT_UXN,
> +               .set    = "UXN",
> +               .clear  = "   ",
> +       }, {
> +               .mask   = PMD_TABLE_PXN,
> +               .val    = PMD_TABLE_PXN,
> +               .set    = "NXTbl",
> +               .clear  = "x    ",
> +       }, {
> +               .mask   = PMD_TABLE_UXN,
> +               .val    = PMD_TABLE_UXN,
> +               .set    = "UXNTbl",
> +               .clear  = "x     ",
> +       }, {
> +               .mask   = PTE_GP,
> +               .val    = PTE_GP,
> +               .set    = "GP",
> +               .clear  = "  ",
> +       }, {
> +               .mask   = PMD_ATTRINDX_MASK,
> +               .val    = PMD_ATTRINDX(MT_DEVICE_nGnRnE),
> +               .set    = "DEVICE/nGnRnE",
> +       }, {
> +               .mask   = PMD_ATTRINDX_MASK,
> +               .val    = PMD_ATTRINDX(MT_DEVICE_nGnRE),
> +               .set    = "DEVICE/nGnRE",
> +       }, {
> +               .mask   = PMD_ATTRINDX_MASK,
> +               .val    = PMD_ATTRINDX(MT_NORMAL_NC),
> +               .set    = "MEM/NORMAL-NC",
> +       }, {
> +               .mask   = PMD_ATTRINDX_MASK,
> +               .val    = PMD_ATTRINDX(MT_NORMAL),
> +               .set    = "MEM/NORMAL",
> +       }, {
> +               .mask   = PMD_ATTRINDX_MASK,
> +               .val    = PMD_ATTRINDX(MT_NORMAL_TAGGED),
> +               .set    = "MEM/NORMAL-TAGGED",
> +       }
> +};
> +
> +static const struct prot_bits pud_bits[] = {
> +       {
> +               .mask   = PUD_TYPE_SECT,
> +               .val    = PUD_TYPE_SECT,
> +               .set    = " ",
> +               .clear  = "F",
> +       }, {
> +               .mask   = PUD_TABLE_BIT,
> +               .val    = PUD_TABLE_BIT,
> +               .set    = "TBL",
> +               .clear  = "BLK",
> +       }, {
> +               .mask   = PTE_USER,
> +               .val    = PTE_USER,
> +               .set    = "USR",
> +               .clear  = "   ",
> +       }, {
> +               .mask   = PUD_SECT_RDONLY,
> +               .val    = PUD_SECT_RDONLY,
> +               .set    = "ro",
> +               .clear  = "RW",
> +       }, {
> +               .mask   = PTE_SHARED,
> +               .val    = PTE_SHARED,
> +               .set    = "SHD",
> +               .clear  = "   ",
> +       }, {
> +               .mask   = PTE_AF,
> +               .val    = PTE_AF,
> +               .set    = "AF",
> +               .clear  = "  ",
> +       }, {
> +               .mask   = PTE_NG,
> +               .val    = PTE_NG,
> +               .set    = "NG",
> +               .clear  = "  ",
> +       }, {
> +               .mask   = PTE_CONT,
> +               .val    = PTE_CONT,
> +               .set    = "CON",
> +               .clear  = "   ",
> +       }, {
> +               .mask   = PUD_TABLE_PXN,
> +               .val    = PUD_TABLE_PXN,
> +               .set    = "NXTbl",
> +               .clear  = "x    ",
> +       }, {
> +               .mask   = PUD_TABLE_UXN,
> +               .val    = PUD_TABLE_UXN,
> +               .set    = "UXNTbl",
> +               .clear  = "      ",
> +       }, {
> +               .mask   = PTE_GP,
> +               .val    = PTE_GP,
> +               .set    = "GP",
> +               .clear  = "  ",
> +       }, {
> +               .mask   = PMD_ATTRINDX_MASK,
> +               .val    = PMD_ATTRINDX(MT_DEVICE_nGnRnE),
> +               .set    = "DEVICE/nGnRnE",
> +       }, {
> +               .mask   = PMD_ATTRINDX_MASK,
> +               .val    = PMD_ATTRINDX(MT_DEVICE_nGnRE),
> +               .set    = "DEVICE/nGnRE",
> +       }, {
> +               .mask   = PMD_ATTRINDX_MASK,
> +               .val    = PMD_ATTRINDX(MT_NORMAL_NC),
> +               .set    = "MEM/NORMAL-NC",
> +       }, {
> +               .mask   = PMD_ATTRINDX_MASK,
> +               .val    = PMD_ATTRINDX(MT_NORMAL),
> +               .set    = "MEM/NORMAL",
> +       }, {
> +               .mask   = PMD_ATTRINDX_MASK,
> +               .val    = PMD_ATTRINDX(MT_NORMAL_TAGGED),
> +               .set    = "MEM/NORMAL-TAGGED",
> +       }
> +};
> +
>  struct pg_level {
>         const struct prot_bits *bits;
>         const char *name;
>         size_t num;
>         u64 mask;
> +       unsigned long size;
>  };
>
>  static struct pg_level pg_level[] = {
>         { /* pgd */
>                 .name   = "PGD",
> -               .bits   = pte_bits,
> -               .num    = ARRAY_SIZE(pte_bits),
> +               .bits   = pud_bits,
> +               .num    = ARRAY_SIZE(pud_bits),
> +               .size   = PGD_SIZE
>         }, { /* p4d */
>                 .name   = "P4D",
> -               .bits   = pte_bits,
> -               .num    = ARRAY_SIZE(pte_bits),
> +               .bits   = pud_bits,
> +               .num    = ARRAY_SIZE(pud_bits),
> +               .size   = P4D_SIZE
>         }, { /* pud */
>                 .name   = (CONFIG_PGTABLE_LEVELS > 3) ? "PUD" : "PGD",
> -               .bits   = pte_bits,
> -               .num    = ARRAY_SIZE(pte_bits),
> +               .bits   = pud_bits,
> +               .num    = ARRAY_SIZE(pud_bits),
>         }, { /* pmd */
>                 .name   = (CONFIG_PGTABLE_LEVELS > 2) ? "PMD" : "PGD",
> -               .bits   = pte_bits,
> -               .num    = ARRAY_SIZE(pte_bits),
> +               .bits   = pmd_bits,
> +               .num    = ARRAY_SIZE(pmd_bits),
>         }, { /* pte */
>                 .name   = "PTE",
>                 .bits   = pte_bits,
>                 .num    = ARRAY_SIZE(pte_bits),
> +               .size   = PAGE_SIZE
>         },
>  };
>
> @@ -252,7 +425,7 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
>                       u64 val)
>  {
>         struct pg_state *st = container_of(pt_st, struct pg_state, ptdump);
> -       static const char units[] = "KMGTPE";
> +       static const char units[] = "BKMGTPE";
>         u64 prot = 0;
>
>         if (level >= 0)
> @@ -263,8 +436,8 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
>                 st->current_prot = prot;
>                 st->start_address = addr;
>                 pt_dump_seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
> -       } else if (prot != st->current_prot || level != st->level ||
> -                  addr >= st->marker[1].start_address) {
> +       } else if ((prot != st->current_prot || level != st->level ||
> +                  addr >= st->marker[1].start_address)) {
>                 const char *unit = units;
>                 unsigned long delta;
>
> @@ -273,10 +446,20 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
>                         note_prot_wx(st, addr);
>                 }
>
> -               pt_dump_seq_printf(st->seq, "0x%016lx-0x%016lx   ",
> -                                  st->start_address, addr);
> +               /*
> +                * Entries are coalesced into a single line, so non-leaf
> +                * entries have no size relative to start_address
> +                */
> +               if (st->start_address != addr) {
> +                       pt_dump_seq_printf(st->seq, "0x%016lx-0x%016lx   ",
> +                                          st->start_address, addr);
> +                       delta = (addr - st->start_address);
> +               } else {
> +                       pt_dump_seq_printf(st->seq, "0x%016lx-0x%016lx   ", addr,
> +                                          addr + pg_level[st->level].size);
> +                       delta = (pg_level[st->level].size);
> +               }
>
> -               delta = (addr - st->start_address) >> 10;
>                 while (!(delta & 1023) && unit[1]) {
>                         delta >>= 10;
>                         unit++;
> @@ -322,7 +505,8 @@ void ptdump_walk(struct seq_file *s, struct ptdump_info *info)
>                         .range = (struct ptdump_range[]){
>                                 {info->base_addr, end},
>                                 {0, 0}
> -                       }
> +                       },
> +                       .note_non_leaf = true
>                 }
>         };
>
> diff --git a/include/linux/ptdump.h b/include/linux/ptdump.h
> index 2a3a95586425..d32fa8515182 100644
> --- a/include/linux/ptdump.h
> +++ b/include/linux/ptdump.h
> @@ -16,6 +16,7 @@ struct ptdump_state {
>                           int level, u64 val);
>         void (*effective_prot)(struct ptdump_state *st, int level, u64 val);
>         const struct ptdump_range *range;
> +       bool note_non_leaf;
>  };
>
>  void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm, pgd_t *pgd);
> diff --git a/mm/ptdump.c b/mm/ptdump.c
> index eea3d28d173c..aacbd499ffcd 100644
> --- a/mm/ptdump.c
> +++ b/mm/ptdump.c
> @@ -40,6 +40,9 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
>         if (st->effective_prot)
>                 st->effective_prot(st, 0, pgd_val(val));
>
> +       if (st->note_non_leaf && !pgd_leaf(val))
> +               st->note_page(st, addr, 0, pgd_val(val));
> +
>         if (pgd_leaf(val)) {
>                 st->note_page(st, addr, 0, pgd_val(val));
>                 walk->action = ACTION_CONTINUE;
> @@ -63,6 +66,9 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
>         if (st->effective_prot)
>                 st->effective_prot(st, 1, p4d_val(val));
>
> +       if (st->note_non_leaf && !p4d_leaf(val))
> +               st->note_page(st, addr, 1, p4d_val(val));
> +
>         if (p4d_leaf(val)) {
>                 st->note_page(st, addr, 1, p4d_val(val));
>                 walk->action = ACTION_CONTINUE;
> @@ -86,6 +92,9 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
>         if (st->effective_prot)
>                 st->effective_prot(st, 2, pud_val(val));
>
> +       if (st->note_non_leaf && !pud_leaf(val))
> +               st->note_page(st, addr, 2, pud_val(val));
> +
>         if (pud_leaf(val)) {
>                 st->note_page(st, addr, 2, pud_val(val));
>                 walk->action = ACTION_CONTINUE;
> @@ -107,6 +116,10 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
>
>         if (st->effective_prot)
>                 st->effective_prot(st, 3, pmd_val(val));
> +
> +       if (st->note_non_leaf && !pmd_leaf(val))
> +               st->note_page(st, addr, 3, pmd_val(val));
> +
>         if (pmd_leaf(val)) {
>                 st->note_page(st, addr, 3, pmd_val(val));
>                 walk->action = ACTION_CONTINUE;
>
> base-commit: 5f9df76887bf8170e8844f1907c13fbbb30e9c36
> --
> 2.34.1
>
diff mbox series

Patch

diff --git a/Documentation/arch/arm64/ptdump.rst b/Documentation/arch/arm64/ptdump.rst
new file mode 100644
index 000000000000..0f38b92fd839
--- /dev/null
+++ b/Documentation/arch/arm64/ptdump.rst
@@ -0,0 +1,125 @@ 
+======================
+Kernel page table dump
+======================
+
+ptdump is a debugfs interface that provides a detailed dump of the kernel page
+tables. It offers a comprehensive overview of the kernel virtual memory layout
+as well as the attributes associated with the various regions in a
+human-readable format. It is useful to dump the kernel page tables to verify
+permissions and memory types. Examining the page table entries and permissions
+helps identify potential security vulnerabilities such as mappings with overly
+permissive access rights or improper memory protections.
+
+Memory hotplug allows dynamic expansion or contraction of available memory
+without requiring a system reboot. To maintain the consistency and integrity of
+the memory management data structures, arm64 makes use of the
+mem_hotplug_lock semaphore in write mode. Additionally, in read mode,
+mem_hotplug_lock supports an efficient implementation of
+get_online_mems() and put_online_mems(). These protect the offlining of
+memory being accessed by the ptdump code.
+
+In order to dump the kernel page tables, enable the following configurations
+and mount debugfs::
+
+ CONFIG_GENERIC_PTDUMP=y
+ CONFIG_PTDUMP_CORE=y
+ CONFIG_PTDUMP_DEBUGFS=y
+
+ mount -t debugfs nodev /sys/kernel/debug
+ cat /sys/kernel/debug/kernel_page_tables
+
+On analysing the output of cat /sys/kernel/debug/kernel_page_tables one can
+derive information about the virtual address range of a contiguous group of
+page table entries, followed by size of the memory region covered by this
+group, the hierarchical structure of the page tables and finally the attributes
+associated with each page in the group. Groups are broken up either according
+to a change in attributes or by parent descriptor, such as a PMD. Note that the
+set of attributes, and therefore formatting, is not equivalent between entry
+types. For example, PMD entries have a separate set of attributes from leaf
+level PTE entries, because they support both the UXNTable and PXNTable
+permission bits.
+
+The page attributes provide information about access permissions, execution
+capability, type of mapping such as leaf level PTE or block level PGD, PMD and
+PUD, and access status of a page within the kernel memory. Non-PTE block or
+page level entries are denoted with either "BLK" or "TBL", respectively.
+Assessing these attributes can assist in understanding the memory layout,
+access patterns and security characteristics of the kernel pages.
+
+Kernel virtual memory layout example::
+
+ start address        end address         size type  leaf    attributes
+ +-----------------------------------------------------------------------------------------------------------------+
+ | ---[ Linear Mapping start ]---                                                                                  |
+ | ...                                                                                                             |
+ | 0xffff0d02c3200000-0xffff0d02c3400000    2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL    |
+ | 0xffff0d02c3200000-0xffff0d02c3218000   96K PTE           ro NX SHD AF NG     UXN    MEM/NORMAL-TAGGED          |
+ | 0xffff0d02c3218000-0xffff0d02c3250000  224K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL-TAGGED          |
+ | 0xffff0d02c3250000-0xffff0d02c33b3000 1420K PTE           ro NX SHD AF NG     UXN    MEM/NORMAL-TAGGED          |
+ | 0xffff0d02c33b3000-0xffff0d02c3400000  308K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL-TAGGED          |
+ | 0xffff0d02c3400000-0xffff0d02c3600000    2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL    |
+ | 0xffff0d02c3400000-0xffff0d02c3600000    2M PTE           RW NX SHD AF NG     UXN    MEM/NORMAL-TAGGED          |
+ | ...                                                                                                             |
+ | 0xffff0d02c3200000-0xffff0d02c3400000    2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL    |
+ | ...                                                                                                             |
+ | ---[ Linear Mapping end ]---                                                                                    |
+ +-----------------------------------------------------------------------------------------------------------------+
+ | ---[ Modules start ]---                                                                                         |
+ | ...                                                                                                             |
+ | 0xffff800000000000-0xffff800000000080 128B PGD   TBL     RW               x     UXNTbl    MEM/NORMAL            |
+ | 0xffff800000000000-0xffff800080000000   2G PUD F BLK     RW               x               MEM/NORMAL            |
+ | ...                                                                                                             |
+ | ---[ Modules end ]---                                                                                           |
+ +-----------------------------------------------------------------------------------------------------------------+
+ | ---[ vmalloc() area ]---                                                                                        |
+ | ...                                                                                                             |
+ | 0xffff800080000000-0xffff8000c0000000   1G PUD   TBL     RW               x     UXNTbl    MEM/NORMAL            |
+ | ...                                                                                                             |
+ | 0xffff800080200000-0xffff800080400000   2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL     |
+ | 0xffff800080200000-0xffff80008022f000 188K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL                  |
+ | 0xffff80008022f000-0xffff800080230000   4K PTE F BLK     RW x                       MEM/NORMAL                  |
+ | 0xffff800080230000-0xffff800080233000  12K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL                  |
+ | 0xffff800080233000-0xffff800080234000   4K PTE F BLK     RW x                       MEM/NORMAL                  |
+ | 0xffff800080234000-0xffff800080237000  12K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL                  |
+ | ...                                                                                                             |
+ | 0xffff800080400000-0xffff800084000000  60M PMD F BLK     RW               x      x     x         MEM/NORMAL     |
+ | ...                                                                                                             |
+ | ---[ vmalloc() end ]---                                                                                         |
+ +-----------------------------------------------------------------------------------------------------------------+
+ | ---[ vmemmap start ]---                                                                                         |
+ | ...                                                                                                             |
+ | 0xfffffe33cb000000-0xfffffe33cc000000  16M PMD   BLK     RW SHD AF NG     NX UXN x     x         MEM/NORMAL     |
+ | 0xfffffe33cc000000-0xfffffe3400000000 832M PMD F BLK     RW               x      x     x         MEM/NORMAL     |
+ | ...                                                                                                             |
+ | ---[ vmemmap end ]---                                                                                           |
+ +-----------------------------------------------------------------------------------------------------------------+
+ | ---[ PCI I/O start ]---                                                                                         |
+ | ...                                                                                                             |
+ | 0xffffffffc0800000-0xffffffffc0810000 64K PTE           RW NX SHD AF NG     UXN    DEVICE/nGnRE                 |
+ | ...                                                                                                             |
+ | ---[ PCI I/O end ]---                                                                                           |
+ +-----------------------------------------------------------------------------------------------------------------+
+ | ---[ Fixmap start ]---                                                                                          |
+ | ...                                                                                                             |
+ | 0xffffffffff5f6000-0xffffffffff5f9000 12K PTE           ro x  SHD AF        UXN    MEM/NORMAL                   |
+ | 0xffffffffff5f9000-0xffffffffff5fa000  4K PTE           ro NX SHD AF NG     UXN    MEM/NORMAL                   |
+ | ...                                                                                                             |
+ | ---[ Fixmap end ]---                                                                                            |
+ +-----------------------------------------------------------------------------------------------------------------+
+
+cat /sys/kernel/debug/kernel_page_tables output::
+
+ 0xffff000000000000-0xffff0d0000000000   13T PGD F BLK     RW               x               MEM/NORMAL
+ 0xffff0d0000000000-0xffff0d0000000080  128B PGD   TBL     RW               NXTbl UXNTbl    MEM/NORMAL
+ 0xffff0d0000000000-0xffff0d02c0000000   11G PUD F BLK     RW               x               MEM/NORMAL
+ 0xffff0d02c0000000-0xffff0d0300000000    1G PUD   TBL     RW               NXTbl UXNTbl    MEM/NORMAL
+ 0xffff0d02c0000000-0xffff0d02c0200000    2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL
+ 0xffff0d02c0000000-0xffff0d02c0200000    2M PTE           RW NX SHD AF NG     UXN    MEM/NORMAL-TAGGED
+ 0xffff0d02c0200000-0xffff0d02c0400000    2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL
+ 0xffff0d02c0200000-0xffff0d02c0210000   64K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL-TAGGED
+ 0xffff0d02c0210000-0xffff0d02c0400000 1984K PTE           ro NX SHD AF NG     UXN    MEM/NORMAL
+ 0xffff0d02c0400000-0xffff0d02c1c00000   24M PMD   BLK     ro SHD AF NG     NX UXN x     x         MEM/NORMAL
+ 0xffff0d02c1c00000-0xffff0d02c1e00000    2M PMD   TBL     RW               x      NXTbl UXNTbl    MEM/NORMAL
+ 0xffff0d02c1c00000-0xffff0d02c1dc0000 1792K PTE           ro NX SHD AF NG     UXN    MEM/NORMAL
+ 0xffff0d02c1dc0000-0xffff0d02c1e00000  256K PTE           RW NX SHD AF NG     UXN    MEM/NORMAL-TAGGED
+
diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index 9bc4066c5bf3..6a8b2bcc9ac7 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -24,6 +24,7 @@ 
 #include <asm/memory.h>
 #include <asm/pgtable-hwdef.h>
 #include <asm/ptdump.h>
+#include <asm/pgalloc.h>
 
 
 enum address_markers_idx {
@@ -97,6 +98,11 @@  static const struct prot_bits pte_bits[] = {
 		.val	= PTE_VALID,
 		.set	= " ",
 		.clear	= "F",
+	}, {
+		.mask	= PTE_TABLE_BIT,
+		.val	= PTE_TABLE_BIT,
+		.set	= "   ",
+		.clear	= "BLK",
 	}, {
 		.mask	= PTE_USER,
 		.val	= PTE_USER,
@@ -132,11 +138,6 @@  static const struct prot_bits pte_bits[] = {
 		.val	= PTE_CONT,
 		.set	= "CON",
 		.clear	= "   ",
-	}, {
-		.mask	= PTE_TABLE_BIT,
-		.val	= PTE_TABLE_BIT,
-		.set	= "   ",
-		.clear	= "BLK",
 	}, {
 		.mask	= PTE_UXN,
 		.val	= PTE_UXN,
@@ -170,34 +171,206 @@  static const struct prot_bits pte_bits[] = {
 	}
 };
 
+static const struct prot_bits pmd_bits[] = {
+	{
+		.mask	= PMD_SECT_VALID,
+		.val	= PMD_SECT_VALID,
+		.set	= " ",
+		.clear	= "F",
+	}, {
+		.mask	= PMD_TABLE_BIT,
+		.val	= PMD_TABLE_BIT,
+		.set	= "TBL",
+		.clear	= "BLK",
+	}, {
+		.mask	= PMD_SECT_USER,
+		.val	= PMD_SECT_USER,
+		.set	= "USR",
+		.clear	= "   ",
+	}, {
+		.mask	= PMD_SECT_RDONLY,
+		.val	= PMD_SECT_RDONLY,
+		.set	= "ro",
+		.clear	= "RW",
+	}, {
+		.mask	= PMD_SECT_S,
+		.val	= PMD_SECT_S,
+		.set	= "SHD",
+		.clear	= "   ",
+	}, {
+		.mask	= PMD_SECT_AF,
+		.val	= PMD_SECT_AF,
+		.set	= "AF",
+		.clear	= "  ",
+	}, {
+		.mask	= PMD_SECT_NG,
+		.val	= PMD_SECT_NG,
+		.set	= "NG",
+		.clear	= "  ",
+	}, {
+		.mask	= PMD_SECT_CONT,
+		.val	= PMD_SECT_CONT,
+		.set	= "CON",
+		.clear	= "   ",
+	}, {
+		.mask	= PMD_SECT_PXN,
+		.val	= PMD_SECT_PXN,
+		.set	= "NX",
+		.clear	= "x ",
+	}, {
+		.mask	= PMD_SECT_UXN,
+		.val	= PMD_SECT_UXN,
+		.set	= "UXN",
+		.clear	= "   ",
+	}, {
+		.mask	= PMD_TABLE_PXN,
+		.val	= PMD_TABLE_PXN,
+		.set	= "NXTbl",
+		.clear	= "x    ",
+	}, {
+		.mask	= PMD_TABLE_UXN,
+		.val	= PMD_TABLE_UXN,
+		.set	= "UXNTbl",
+		.clear	= "x     ",
+	}, {
+		.mask	= PTE_GP,
+		.val	= PTE_GP,
+		.set	= "GP",
+		.clear	= "  ",
+	}, {
+		.mask	= PMD_ATTRINDX_MASK,
+		.val	= PMD_ATTRINDX(MT_DEVICE_nGnRnE),
+		.set	= "DEVICE/nGnRnE",
+	}, {
+		.mask	= PMD_ATTRINDX_MASK,
+		.val	= PMD_ATTRINDX(MT_DEVICE_nGnRE),
+		.set	= "DEVICE/nGnRE",
+	}, {
+		.mask	= PMD_ATTRINDX_MASK,
+		.val	= PMD_ATTRINDX(MT_NORMAL_NC),
+		.set	= "MEM/NORMAL-NC",
+	}, {
+		.mask	= PMD_ATTRINDX_MASK,
+		.val	= PMD_ATTRINDX(MT_NORMAL),
+		.set	= "MEM/NORMAL",
+	}, {
+		.mask	= PMD_ATTRINDX_MASK,
+		.val	= PMD_ATTRINDX(MT_NORMAL_TAGGED),
+		.set	= "MEM/NORMAL-TAGGED",
+	}
+};
+
+static const struct prot_bits pud_bits[] = {
+	{
+		.mask	= PUD_TYPE_SECT,
+		.val	= PUD_TYPE_SECT,
+		.set	= " ",
+		.clear	= "F",
+	}, {
+		.mask	= PUD_TABLE_BIT,
+		.val	= PUD_TABLE_BIT,
+		.set	= "TBL",
+		.clear	= "BLK",
+	}, {
+		.mask	= PTE_USER,
+		.val	= PTE_USER,
+		.set	= "USR",
+		.clear	= "   ",
+	}, {
+		.mask	= PUD_SECT_RDONLY,
+		.val	= PUD_SECT_RDONLY,
+		.set	= "ro",
+		.clear	= "RW",
+	}, {
+		.mask	= PTE_SHARED,
+		.val	= PTE_SHARED,
+		.set	= "SHD",
+		.clear	= "   ",
+	}, {
+		.mask	= PTE_AF,
+		.val	= PTE_AF,
+		.set	= "AF",
+		.clear	= "  ",
+	}, {
+		.mask	= PTE_NG,
+		.val	= PTE_NG,
+		.set	= "NG",
+		.clear	= "  ",
+	}, {
+		.mask	= PTE_CONT,
+		.val	= PTE_CONT,
+		.set	= "CON",
+		.clear	= "   ",
+	}, {
+		.mask	= PUD_TABLE_PXN,
+		.val	= PUD_TABLE_PXN,
+		.set	= "NXTbl",
+		.clear	= "x    ",
+	}, {
+		.mask	= PUD_TABLE_UXN,
+		.val	= PUD_TABLE_UXN,
+		.set	= "UXNTbl",
+		.clear	= "      ",
+	}, {
+		.mask	= PTE_GP,
+		.val	= PTE_GP,
+		.set	= "GP",
+		.clear	= "  ",
+	}, {
+		.mask	= PMD_ATTRINDX_MASK,
+		.val	= PMD_ATTRINDX(MT_DEVICE_nGnRnE),
+		.set	= "DEVICE/nGnRnE",
+	}, {
+		.mask	= PMD_ATTRINDX_MASK,
+		.val	= PMD_ATTRINDX(MT_DEVICE_nGnRE),
+		.set	= "DEVICE/nGnRE",
+	}, {
+		.mask	= PMD_ATTRINDX_MASK,
+		.val	= PMD_ATTRINDX(MT_NORMAL_NC),
+		.set	= "MEM/NORMAL-NC",
+	}, {
+		.mask	= PMD_ATTRINDX_MASK,
+		.val	= PMD_ATTRINDX(MT_NORMAL),
+		.set	= "MEM/NORMAL",
+	}, {
+		.mask	= PMD_ATTRINDX_MASK,
+		.val	= PMD_ATTRINDX(MT_NORMAL_TAGGED),
+		.set	= "MEM/NORMAL-TAGGED",
+	}
+};
+
 struct pg_level {
 	const struct prot_bits *bits;
 	const char *name;
 	size_t num;
 	u64 mask;
+	unsigned long size;
 };
 
 static struct pg_level pg_level[] = {
 	{ /* pgd */
 		.name	= "PGD",
-		.bits	= pte_bits,
-		.num	= ARRAY_SIZE(pte_bits),
+		.bits	= pud_bits,
+		.num	= ARRAY_SIZE(pud_bits),
+		.size	= PGD_SIZE
 	}, { /* p4d */
 		.name	= "P4D",
-		.bits	= pte_bits,
-		.num	= ARRAY_SIZE(pte_bits),
+		.bits	= pud_bits,
+		.num	= ARRAY_SIZE(pud_bits),
+		.size	= P4D_SIZE
 	}, { /* pud */
 		.name	= (CONFIG_PGTABLE_LEVELS > 3) ? "PUD" : "PGD",
-		.bits	= pte_bits,
-		.num	= ARRAY_SIZE(pte_bits),
+		.bits	= pud_bits,
+		.num	= ARRAY_SIZE(pud_bits),
 	}, { /* pmd */
 		.name	= (CONFIG_PGTABLE_LEVELS > 2) ? "PMD" : "PGD",
-		.bits	= pte_bits,
-		.num	= ARRAY_SIZE(pte_bits),
+		.bits	= pmd_bits,
+		.num	= ARRAY_SIZE(pmd_bits),
 	}, { /* pte */
 		.name	= "PTE",
 		.bits	= pte_bits,
 		.num	= ARRAY_SIZE(pte_bits),
+		.size	= PAGE_SIZE
 	},
 };
 
@@ -252,7 +425,7 @@  static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
 		      u64 val)
 {
 	struct pg_state *st = container_of(pt_st, struct pg_state, ptdump);
-	static const char units[] = "KMGTPE";
+	static const char units[] = "BKMGTPE";
 	u64 prot = 0;
 
 	if (level >= 0)
@@ -263,8 +436,8 @@  static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
 		st->current_prot = prot;
 		st->start_address = addr;
 		pt_dump_seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
-	} else if (prot != st->current_prot || level != st->level ||
-		   addr >= st->marker[1].start_address) {
+	} else if ((prot != st->current_prot || level != st->level ||
+		   addr >= st->marker[1].start_address)) {
 		const char *unit = units;
 		unsigned long delta;
 
@@ -273,10 +446,20 @@  static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
 			note_prot_wx(st, addr);
 		}
 
-		pt_dump_seq_printf(st->seq, "0x%016lx-0x%016lx   ",
-				   st->start_address, addr);
+		/*
+		 * Entries are coalesced into a single line, so non-leaf
+		 * entries have no size relative to start_address
+		 */
+		if (st->start_address != addr) {
+			pt_dump_seq_printf(st->seq, "0x%016lx-0x%016lx   ",
+					   st->start_address, addr);
+			delta = (addr - st->start_address);
+		} else {
+			pt_dump_seq_printf(st->seq, "0x%016lx-0x%016lx   ", addr,
+					   addr + pg_level[st->level].size);
+			delta = (pg_level[st->level].size);
+		}
 
-		delta = (addr - st->start_address) >> 10;
 		while (!(delta & 1023) && unit[1]) {
 			delta >>= 10;
 			unit++;
@@ -322,7 +505,8 @@  void ptdump_walk(struct seq_file *s, struct ptdump_info *info)
 			.range = (struct ptdump_range[]){
 				{info->base_addr, end},
 				{0, 0}
-			}
+			},
+			.note_non_leaf = true
 		}
 	};
 
diff --git a/include/linux/ptdump.h b/include/linux/ptdump.h
index 2a3a95586425..d32fa8515182 100644
--- a/include/linux/ptdump.h
+++ b/include/linux/ptdump.h
@@ -16,6 +16,7 @@  struct ptdump_state {
 			  int level, u64 val);
 	void (*effective_prot)(struct ptdump_state *st, int level, u64 val);
 	const struct ptdump_range *range;
+	bool note_non_leaf;
 };
 
 void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm, pgd_t *pgd);
diff --git a/mm/ptdump.c b/mm/ptdump.c
index eea3d28d173c..aacbd499ffcd 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -40,6 +40,9 @@  static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
 	if (st->effective_prot)
 		st->effective_prot(st, 0, pgd_val(val));
 
+	if (st->note_non_leaf && !pgd_leaf(val))
+		st->note_page(st, addr, 0, pgd_val(val));
+
 	if (pgd_leaf(val)) {
 		st->note_page(st, addr, 0, pgd_val(val));
 		walk->action = ACTION_CONTINUE;
@@ -63,6 +66,9 @@  static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
 	if (st->effective_prot)
 		st->effective_prot(st, 1, p4d_val(val));
 
+	if (st->note_non_leaf && !p4d_leaf(val))
+		st->note_page(st, addr, 1, p4d_val(val));
+
 	if (p4d_leaf(val)) {
 		st->note_page(st, addr, 1, p4d_val(val));
 		walk->action = ACTION_CONTINUE;
@@ -86,6 +92,9 @@  static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
 	if (st->effective_prot)
 		st->effective_prot(st, 2, pud_val(val));
 
+	if (st->note_non_leaf && !pud_leaf(val))
+		st->note_page(st, addr, 2, pud_val(val));
+
 	if (pud_leaf(val)) {
 		st->note_page(st, addr, 2, pud_val(val));
 		walk->action = ACTION_CONTINUE;
@@ -107,6 +116,10 @@  static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
 
 	if (st->effective_prot)
 		st->effective_prot(st, 3, pmd_val(val));
+
+	if (st->note_non_leaf && !pmd_leaf(val))
+		st->note_page(st, addr, 3, pmd_val(val));
+
 	if (pmd_leaf(val)) {
 		st->note_page(st, addr, 3, pmd_val(val));
 		walk->action = ACTION_CONTINUE;