diff mbox series

[v2,1/1] mm: kfence: apply kmemleak_ignore_phys on early allocated pool

Message ID 20220628113714.7792-2-yee.lee@mediatek.com (mailing list archive)
State New
Headers show
Series [v2,1/1] mm: kfence: apply kmemleak_ignore_phys on early allocated pool | expand

Commit Message

Yee Lee (李建誼) June 28, 2022, 11:37 a.m. UTC
From: Yee Lee <yee.lee@mediatek.com>

This patch solves two issues.

(1) The pool allocated by memblock needs to unregister from
kmemleak scanning. Apply kmemleak_ignore_phys to replace the
original kmemleak_free as its address now is stored in the phys tree.

(2) The pool late allocated by page-alloc doesn't need to unregister.
Move out the freeing operation from its call path.

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Yee Lee <yee.lee@mediatek.com>
---
 mm/kfence/core.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

Comments

Catalin Marinas June 29, 2022, 9:39 p.m. UTC | #1
On Tue, Jun 28, 2022 at 07:37:11PM +0800, yee.lee@mediatek.com wrote:
> From: Yee Lee <yee.lee@mediatek.com>
> 
> This patch solves two issues.
> 
> (1) The pool allocated by memblock needs to unregister from
> kmemleak scanning. Apply kmemleak_ignore_phys to replace the
> original kmemleak_free as its address now is stored in the phys tree.
> 
> (2) The pool late allocated by page-alloc doesn't need to unregister.
> Move out the freeing operation from its call path.
> 
> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Yee Lee <yee.lee@mediatek.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Geert Uytterhoeven July 15, 2022, 8:17 a.m. UTC | #2
Hi Yee,

On Tue, Jun 28, 2022 at 1:42 PM <yee.lee@mediatek.com> wrote:
> From: Yee Lee <yee.lee@mediatek.com>
>
> This patch solves two issues.
>
> (1) The pool allocated by memblock needs to unregister from
> kmemleak scanning. Apply kmemleak_ignore_phys to replace the
> original kmemleak_free as its address now is stored in the phys tree.
>
> (2) The pool late allocated by page-alloc doesn't need to unregister.
> Move out the freeing operation from its call path.
>
> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Yee Lee <yee.lee@mediatek.com>

Thank you, this fixes the storm of

    BUG: KFENCE: invalid read in scan_block+0x78/0x130
    BUG: KFENCE: use-after-free read in scan_block+0x78/0x130
    BUG: KFENCE: out-of-bounds read in scan_block+0x78/0x130

messages I was seeing on arm64.

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Andrew Morton July 15, 2022, 11:33 p.m. UTC | #3
On Fri, 15 Jul 2022 10:17:43 +0200 Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> On Tue, Jun 28, 2022 at 1:42 PM <yee.lee@mediatek.com> wrote:
> > From: Yee Lee <yee.lee@mediatek.com>
> >
> > This patch solves two issues.
> >
> > (1) The pool allocated by memblock needs to unregister from
> > kmemleak scanning. Apply kmemleak_ignore_phys to replace the
> > original kmemleak_free as its address now is stored in the phys tree.
> >
> > (2) The pool late allocated by page-alloc doesn't need to unregister.
> > Move out the freeing operation from its call path.
> >
> > Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> > Suggested-by: Marco Elver <elver@google.com>
> > Signed-off-by: Yee Lee <yee.lee@mediatek.com>
> 
> Thank you, this fixes the storm of
> 
>     BUG: KFENCE: invalid read in scan_block+0x78/0x130
>     BUG: KFENCE: use-after-free read in scan_block+0x78/0x130
>     BUG: KFENCE: out-of-bounds read in scan_block+0x78/0x130
> 
> messages I was seeing on arm64.

Thanks, but...

- It would be great if we could identify a Fixes: for this.

- This patch has been accused of crashing the kernel:

	https://lkml.kernel.org/r/YsFeUHkrFTQ7T51Q@xsang-OptiPlex-9020

  Do we think that report is bogus?
Geert Uytterhoeven July 16, 2022, 6:43 p.m. UTC | #4
Hi Andrew,

On Sat, Jul 16, 2022 at 1:33 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> On Fri, 15 Jul 2022 10:17:43 +0200 Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Tue, Jun 28, 2022 at 1:42 PM <yee.lee@mediatek.com> wrote:
> > > From: Yee Lee <yee.lee@mediatek.com>
> > >
> > > This patch solves two issues.
> > >
> > > (1) The pool allocated by memblock needs to unregister from
> > > kmemleak scanning. Apply kmemleak_ignore_phys to replace the
> > > original kmemleak_free as its address now is stored in the phys tree.
> > >
> > > (2) The pool late allocated by page-alloc doesn't need to unregister.
> > > Move out the freeing operation from its call path.
> > >
> > > Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> > > Suggested-by: Marco Elver <elver@google.com>
> > > Signed-off-by: Yee Lee <yee.lee@mediatek.com>
> >
> > Thank you, this fixes the storm of
> >
> >     BUG: KFENCE: invalid read in scan_block+0x78/0x130
> >     BUG: KFENCE: use-after-free read in scan_block+0x78/0x130
> >     BUG: KFENCE: out-of-bounds read in scan_block+0x78/0x130
> >
> > messages I was seeing on arm64.
>
> Thanks, but...
>
> - It would be great if we could identify a Fixes: for this.

IIRC, I started seeing the issue with "[PATCH v4 3/4] mm:
kmemleak: add rbtree and store physical address for objects
allocated with PA" (i.e. commit 0c24e061196c21d5 ("mm: kmemleak:
add rbtree and store physical address for objects allocated
with PA")) of series "[PATCH v4 0/4] mm: kmemleak: store objects
allocated with physical address separately and check when scan"
(https://lore.kernel.org/all/20220611035551.1823303-1-patrick.wang.shcn@gmail.com),
in an arm64 config that had enabled kfence.
So I think this patch is sort of a dependency for that series.

I had cherry-picked that series after bisecting a regression to
commit 23c2d497de21f258 ("mm: kmemleak: take a full lowmem check in
kmemleak_*_phys()") in v5.18-rc3, and having a look around.

> - This patch has been accused of crashing the kernel:
>
>         https://lkml.kernel.org/r/YsFeUHkrFTQ7T51Q@xsang-OptiPlex-9020
>
>   Do we think that report is bogus?

I think all of this is highly architecture-specific...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Marco Elver July 18, 2022, 2:26 p.m. UTC | #5
On Sat, 16 Jul 2022 at 20:43, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
[...]
> > - This patch has been accused of crashing the kernel:
> >
> >         https://lkml.kernel.org/r/YsFeUHkrFTQ7T51Q@xsang-OptiPlex-9020
> >
> >   Do we think that report is bogus?
>
> I think all of this is highly architecture-specific...

The report can be reproduced on i386 with CONFIG_X86_PAE=y. But e.g.
mm/memblock.c:memblock_free() is also guilty of using __pa() on
previously memblock_alloc()'d addresses. Looking at the phys addr
before memblock_alloc() does virt_to_phys(), the result of __pa()
looks correct even on PAE, at least for the purpose of passing it on
to kmemleak(). So I don't know what that BUG_ON(slow_virt_to_phys() !=
phys_addr) is supposed to tell us here.

Ideas?
Catalin Marinas July 19, 2022, 11:50 a.m. UTC | #6
On Sat, Jul 16, 2022 at 08:43:06PM +0200, Geert Uytterhoeven wrote:
> On Sat, Jul 16, 2022 at 1:33 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Fri, 15 Jul 2022 10:17:43 +0200 Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Tue, Jun 28, 2022 at 1:42 PM <yee.lee@mediatek.com> wrote:
> > > > From: Yee Lee <yee.lee@mediatek.com>
> > > >
> > > > This patch solves two issues.
> > > >
> > > > (1) The pool allocated by memblock needs to unregister from
> > > > kmemleak scanning. Apply kmemleak_ignore_phys to replace the
> > > > original kmemleak_free as its address now is stored in the phys tree.
> > > >
> > > > (2) The pool late allocated by page-alloc doesn't need to unregister.
> > > > Move out the freeing operation from its call path.
> > > >
> > > > Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> > > > Suggested-by: Marco Elver <elver@google.com>
> > > > Signed-off-by: Yee Lee <yee.lee@mediatek.com>
> > >
> > > Thank you, this fixes the storm of
> > >
> > >     BUG: KFENCE: invalid read in scan_block+0x78/0x130
> > >     BUG: KFENCE: use-after-free read in scan_block+0x78/0x130
> > >     BUG: KFENCE: out-of-bounds read in scan_block+0x78/0x130
> > >
> > > messages I was seeing on arm64.
> >
> > Thanks, but...
> >
> > - It would be great if we could identify a Fixes: for this.
> 
> IIRC, I started seeing the issue with "[PATCH v4 3/4] mm:
> kmemleak: add rbtree and store physical address for objects
> allocated with PA" (i.e. commit 0c24e061196c21d5 ("mm: kmemleak:
> add rbtree and store physical address for objects allocated
> with PA")) of series "[PATCH v4 0/4] mm: kmemleak: store objects
> allocated with physical address separately and check when scan"
> (https://lore.kernel.org/all/20220611035551.1823303-1-patrick.wang.shcn@gmail.com),
> in an arm64 config that had enabled kfence.

Yes, I think it fixes 0c24e061196c21d5 since after that commit, the
kmemleak_free() no longer worked as expected on physically allocated
objects.
Andrew Morton July 19, 2022, 11:13 p.m. UTC | #7
On Mon, 18 Jul 2022 16:26:25 +0200 Marco Elver <elver@google.com> wrote:

> On Sat, 16 Jul 2022 at 20:43, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> [...]
> > > - This patch has been accused of crashing the kernel:
> > >
> > >         https://lkml.kernel.org/r/YsFeUHkrFTQ7T51Q@xsang-OptiPlex-9020
> > >
> > >   Do we think that report is bogus?
> >
> > I think all of this is highly architecture-specific...
> 
> The report can be reproduced on i386 with CONFIG_X86_PAE=y. But e.g.
> mm/memblock.c:memblock_free() is also guilty of using __pa() on
> previously memblock_alloc()'d addresses. Looking at the phys addr
> before memblock_alloc() does virt_to_phys(), the result of __pa()
> looks correct even on PAE, at least for the purpose of passing it on
> to kmemleak(). So I don't know what that BUG_ON(slow_virt_to_phys() !=
> phys_addr) is supposed to tell us here.
> 

It's only been nine years, so I'm sure Dave can remember why he added
it ;)

		BUG_ON(slow_virt_to_phys((void *)x) != phys_addr);

in arch/x86/mm/physaddr.c:__phys_addr().


This kfence patch does seem to be desirable, but we can't proceed if
it's resulting in kernel crashes.
Dave Hansen July 19, 2022, 11:22 p.m. UTC | #8
On 7/19/22 16:13, Andrew Morton wrote:
> On Mon, 18 Jul 2022 16:26:25 +0200 Marco Elver <elver@google.com> wrote:
> 
>> On Sat, 16 Jul 2022 at 20:43, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>> [...]
>>>> - This patch has been accused of crashing the kernel:
>>>>
>>>>         https://lkml.kernel.org/r/YsFeUHkrFTQ7T51Q@xsang-OptiPlex-9020
>>>>
>>>>   Do we think that report is bogus?
>>> I think all of this is highly architecture-specific...
>> The report can be reproduced on i386 with CONFIG_X86_PAE=y. But e.g.
>> mm/memblock.c:memblock_free() is also guilty of using __pa() on
>> previously memblock_alloc()'d addresses. Looking at the phys addr
>> before memblock_alloc() does virt_to_phys(), the result of __pa()
>> looks correct even on PAE, at least for the purpose of passing it on
>> to kmemleak(). So I don't know what that BUG_ON(slow_virt_to_phys() !=
>> phys_addr) is supposed to tell us here.
>>
> It's only been nine years, so I'm sure Dave can remember why he added
> it ;)
> 
> 		BUG_ON(slow_virt_to_phys((void *)x) != phys_addr);
> 
> in arch/x86/mm/physaddr.c:__phys_addr().

I think I intended it to double check that the linear map is *actually*
a linear map for 'x'.  Sure, we can use the "x - PAGE_OFFSET" shortcut,
but did it turn out to be actually accurate for the address it was handed?

I'd be curious what the page tables actually say for the address that's
causing problems.
Marco Elver Aug. 1, 2022, 2:05 p.m. UTC | #9
[+x86 maintainers ...]

On Wed, 20 Jul 2022 at 01:22, Dave Hansen <dave.hansen@intel.com> wrote:
> On 7/19/22 16:13, Andrew Morton wrote:
> > On Mon, 18 Jul 2022 16:26:25 +0200 Marco Elver <elver@google.com> wrote:
> >
> >> On Sat, 16 Jul 2022 at 20:43, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >> [...]
> >>>> - This patch has been accused of crashing the kernel:
> >>>>
> >>>>         https://lkml.kernel.org/r/YsFeUHkrFTQ7T51Q@xsang-OptiPlex-9020
> >>>>
> >>>>   Do we think that report is bogus?
> >>> I think all of this is highly architecture-specific...
> >> The report can be reproduced on i386 with CONFIG_X86_PAE=y. But e.g.
> >> mm/memblock.c:memblock_free() is also guilty of using __pa() on
> >> previously memblock_alloc()'d addresses. Looking at the phys addr
> >> before memblock_alloc() does virt_to_phys(), the result of __pa()
> >> looks correct even on PAE, at least for the purpose of passing it on
> >> to kmemleak(). So I don't know what that BUG_ON(slow_virt_to_phys() !=
> >> phys_addr) is supposed to tell us here.
> >>
> > It's only been nine years, so I'm sure Dave can remember why he added
> > it ;)
> >
> >               BUG_ON(slow_virt_to_phys((void *)x) != phys_addr);
> >
> > in arch/x86/mm/physaddr.c:__phys_addr().
>
> I think I intended it to double check that the linear map is *actually*
> a linear map for 'x'.  Sure, we can use the "x - PAGE_OFFSET" shortcut,
> but did it turn out to be actually accurate for the address it was handed?
>
> I'd be curious what the page tables actually say for the address that's
> causing problems.

test robot just reminded us again:
https://lore.kernel.org/all/YufXncrWhJZH0ifB@xsang-OptiPlex-9020/T/#u

Few things I noticed:

* mm/memblock.c's memblock_free() also uses __pa() to convert back to
physical address. Presumably that's also wrong. What should be used
instead?

* kmemleak happily converts phys_addr_t to unsigned long everywhere,
but with i386 PAE, this will narrow a 64-bit address to a 32-bit
address. Is that correct? Does kmemleak need a "depends on 64BIT ||
!PHYS_ADDR_T_64BIT"?
diff mbox series

Patch

diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 4e7cd4c8e687..32a4a75e820c 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -600,14 +600,6 @@  static unsigned long kfence_init_pool(void)
 		addr += 2 * PAGE_SIZE;
 	}
 
-	/*
-	 * The pool is live and will never be deallocated from this point on.
-	 * Remove the pool object from the kmemleak object tree, as it would
-	 * otherwise overlap with allocations returned by kfence_alloc(), which
-	 * are registered with kmemleak through the slab post-alloc hook.
-	 */
-	kmemleak_free(__kfence_pool);
-
 	return 0;
 }
 
@@ -620,8 +612,16 @@  static bool __init kfence_init_pool_early(void)
 
 	addr = kfence_init_pool();
 
-	if (!addr)
+	if (!addr) {
+		/*
+		 * The pool is live and will never be deallocated from this point on.
+		 * Ignore the pool object from the kmemleak phys object tree, as it would
+		 * otherwise overlap with allocations returned by kfence_alloc(), which
+		 * are registered with kmemleak through the slab post-alloc hook.
+		 */
+		kmemleak_ignore_phys(__pa(__kfence_pool));
 		return true;
+	}
 
 	/*
 	 * Only release unprotected pages, and do not try to go back and change