diff mbox series

x86/pm: Fix false positive kmemleak report in msr_build_context().

Message ID 20240314142656.17699-1-anton@tuxera.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series x86/pm: Fix false positive kmemleak report in msr_build_context(). | expand

Commit Message

Anton Altaparmakov March 14, 2024, 2:26 p.m. UTC
Since

  7ee18d677989 ("x86/power: Make restore_processor_context() sane")

kmemleak reports this issue:

unreferenced object 0xf68241e0 (size 32):
  comm "swapper/0", pid 1, jiffies 4294668610 (age 68.432s)
  hex dump (first 32 bytes):
    00 cc cc cc 29 10 01 c0 00 00 00 00 00 00 00 00  ....)...........
    00 42 82 f6 cc cc cc cc cc cc cc cc cc cc cc cc  .B..............
  backtrace:
    [<461c1d50>] __kmem_cache_alloc_node+0x106/0x260
    [<ea65e13b>] __kmalloc+0x54/0x160
    [<c3858cd2>] msr_build_context.constprop.0+0x35/0x100
    [<46635aff>] pm_check_save_msr+0x63/0x80
    [<6b6bb938>] do_one_initcall+0x41/0x1f0
    [<3f3add60>] kernel_init_freeable+0x199/0x1e8
    [<3b538fde>] kernel_init+0x1a/0x110
    [<938ae2b2>] ret_from_fork+0x1c/0x28

Reproducer:

- Run rsync of whole kernel tree (multiple times if needed).
- start a kmemleak scan
- Note this is just an example: a lot of our internal tests hit these.

The root cause is we expect the same as the equivalent fix in commit
b0b592cf0836, i.e. the alignment within the packed struct saved_context
which has everything unaligned as there is only "u16 gs;" at start of
struct where in the past there were four u16 there thus aligning
everything afterwards.  The issue is with the fact that Kmemleak only
searches for pointers that are aligned (see how pointers are scanned in
kmemleak.c) so when the struct members are not aligned it doesn't see
them.

Note we have picked this up on 5.4, 6.1 and 6.6 kernels but we expect it
is the same on all kernels >= 4.15 as the commit 7ee18d677989 which
changed from having four u16 to a single u16 at the start of the struct
was introduced in 4.15.

Fixes: 7ee18d677989 ("x86/power: Make restore_processor_context() sane")
Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
Cc: stable@vger.kernel.org
---
 arch/x86/include/asm/suspend_32.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Dave Hansen March 14, 2024, 3:05 p.m. UTC | #1
On 3/14/24 07:26, Anton Altaparmakov wrote:
>  /* image of the saved processor state */
>  struct saved_context {
> -	/*
> -	 * On x86_32, all segment registers except gs are saved at kernel
> -	 * entry in pt_regs.
> -	 */
> -	u16 gs;
>  	unsigned long cr0, cr2, cr3, cr4;
>  	u64 misc_enable;
>  	struct saved_msrs saved_msrs;
> @@ -27,6 +22,11 @@ struct saved_context {
>  	unsigned long tr;
>  	unsigned long safety;
>  	unsigned long return_address;
> +	/*
> +	 * On x86_32, all segment registers except gs are saved at kernel
> +	 * entry in pt_regs.
> +	 */
> +	u16 gs;
>  	bool misc_enable_saved;
>  } __attribute__((packed));

Isn't this just kinda poking at the symptoms?  This seems to be
basically the exact same bug as b0b592cf08, just with a different source
of unaligned structure members.

There's nothing to keep folks from reintroducing these kinds of issues
and evidently no way to detect when they happen without lengthy reproducers.
Anton Altaparmakov March 14, 2024, 3:45 p.m. UTC | #2
Hi Dave,

> On 14 Mar 2024, at 15:05, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 3/14/24 07:26, Anton Altaparmakov wrote:
>> /* image of the saved processor state */
>> struct saved_context {
>> - /*
>> - * On x86_32, all segment registers except gs are saved at kernel
>> - * entry in pt_regs.
>> - */
>> - u16 gs;
>> unsigned long cr0, cr2, cr3, cr4;
>> u64 misc_enable;
>> struct saved_msrs saved_msrs;
>> @@ -27,6 +22,11 @@ struct saved_context {
>> unsigned long tr;
>> unsigned long safety;
>> unsigned long return_address;
>> + /*
>> + * On x86_32, all segment registers except gs are saved at kernel
>> + * entry in pt_regs.
>> + */
>> + u16 gs;
>> bool misc_enable_saved;
>> } __attribute__((packed));
> 
> Isn't this just kinda poking at the symptoms?  This seems to be
> basically the exact same bug as b0b592cf08, just with a different source
> of unaligned structure members.

Yes, that is exactly the same bug.  That's how we figured out the solution in fact - it is totally the same problem with another struct member...

> There's nothing to keep folks from reintroducing these kinds of issues
> and evidently no way to detect when they happen without lengthy reproducers.

Correct.  But short of adding asserts / documentation that pointers must be aligned or kmemleak won't work or fixing kmemleak (which I expect is not tractical as it would become a lot slower if nothing else) not sure what else can be done.

Given I cannot see any alternative to fixing the kmemleak failures I think it is worth applying this fix.

Unless you have better ideas how to fix this issue?

What I can say is that we run a lot of tests with our CI and applying this fix we do not see any kmemleak issues any more whilst without it we see hundreds of the above - from a single, simple test run consisting of 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this which is quite a lot.  With this fix applied we get zero kmemleak related failures.

Best regards,

Anton
Rafael J. Wysocki March 14, 2024, 3:55 p.m. UTC | #3
On Thu, Mar 14, 2024 at 4:05 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 3/14/24 07:26, Anton Altaparmakov wrote:
> >  /* image of the saved processor state */
> >  struct saved_context {
> > -     /*
> > -      * On x86_32, all segment registers except gs are saved at kernel
> > -      * entry in pt_regs.
> > -      */
> > -     u16 gs;
> >       unsigned long cr0, cr2, cr3, cr4;
> >       u64 misc_enable;
> >       struct saved_msrs saved_msrs;
> > @@ -27,6 +22,11 @@ struct saved_context {
> >       unsigned long tr;
> >       unsigned long safety;
> >       unsigned long return_address;
> > +     /*
> > +      * On x86_32, all segment registers except gs are saved at kernel
> > +      * entry in pt_regs.
> > +      */
> > +     u16 gs;
> >       bool misc_enable_saved;
> >  } __attribute__((packed));
>
> Isn't this just kinda poking at the symptoms?  This seems to be
> basically the exact same bug as b0b592cf08, just with a different source
> of unaligned structure members.
>
> There's nothing to keep folks from reintroducing these kinds of issues
> and evidently no way to detect when they happen without lengthy reproducers.

This change is fine with me FWIW, but I agree that making it for
kmemleak reasons feels kind of misguided.
Ingo Molnar March 22, 2024, 9:58 a.m. UTC | #4
* Rafael J. Wysocki <rafael@kernel.org> wrote:

> On Thu, Mar 14, 2024 at 4:05 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 3/14/24 07:26, Anton Altaparmakov wrote:
> > >  /* image of the saved processor state */
> > >  struct saved_context {
> > > -     /*
> > > -      * On x86_32, all segment registers except gs are saved at kernel
> > > -      * entry in pt_regs.
> > > -      */
> > > -     u16 gs;
> > >       unsigned long cr0, cr2, cr3, cr4;
> > >       u64 misc_enable;
> > >       struct saved_msrs saved_msrs;
> > > @@ -27,6 +22,11 @@ struct saved_context {
> > >       unsigned long tr;
> > >       unsigned long safety;
> > >       unsigned long return_address;
> > > +     /*
> > > +      * On x86_32, all segment registers except gs are saved at kernel
> > > +      * entry in pt_regs.
> > > +      */
> > > +     u16 gs;
> > >       bool misc_enable_saved;
> > >  } __attribute__((packed));
> >
> > Isn't this just kinda poking at the symptoms?  This seems to be
> > basically the exact same bug as b0b592cf08, just with a different source
> > of unaligned structure members.
> >
> > There's nothing to keep folks from reintroducing these kinds of issues
> > and evidently no way to detect when they happen without lengthy reproducers.
> 
> This change is fine with me FWIW,

thx, I've added your:

    Acked-by: "Rafael J. Wysocki" <rafael@kernel.org>

> but I agree that making it for kmemleak reasons feels kind of misguided.

Yeah, so it's a workaround, but kmemleak is also a useful debugging 
facility that is finding memory leaks that static checkers are missing.

The fact that we don't have an easy way to prevent these problems from 
being introduced is I think properly counterbalanced by the facts that:

  1) Only kmemleak users are inconvenienced by the false positives.

  2) kmemleak users & maintainers have created the patch. There was no 
     pressure on us x86 maintainers other than to apply a root-cause 
     analyzed patch.

  2) Over a timespan of ~10 years only 2 such alignment problems were 
     introduced, and they were fixed by the kmemleak folks. I think that's 
     a fair price to pay for a useful facility.

Ie. I don't think there's any long-term maintenance burder concern.

So I've applied this workaround to x86/urgent, with a change to the title 
to make sure this isn't understood as a real bug in the PM code, but a 
workaround:

   37fb408c99af x86/pm: Work around false positive kmemleak report in msr_build_context()

... lemme know if you feel strongly about this. :-)

Thanks,

	Ingo
Ingo Molnar March 22, 2024, 10:03 a.m. UTC | #5
* Anton Altaparmakov <anton@tuxera.com> wrote:

> Hi Dave,
> 
> > On 14 Mar 2024, at 15:05, Dave Hansen <dave.hansen@intel.com> wrote:
> > 
> > On 3/14/24 07:26, Anton Altaparmakov wrote:
> >> /* image of the saved processor state */
> >> struct saved_context {
> >> - /*
> >> - * On x86_32, all segment registers except gs are saved at kernel
> >> - * entry in pt_regs.
> >> - */
> >> - u16 gs;
> >> unsigned long cr0, cr2, cr3, cr4;
> >> u64 misc_enable;
> >> struct saved_msrs saved_msrs;
> >> @@ -27,6 +22,11 @@ struct saved_context {
> >> unsigned long tr;
> >> unsigned long safety;
> >> unsigned long return_address;
> >> + /*
> >> + * On x86_32, all segment registers except gs are saved at kernel
> >> + * entry in pt_regs.
> >> + */
> >> + u16 gs;
> >> bool misc_enable_saved;
> >> } __attribute__((packed));
> > 
> > Isn't this just kinda poking at the symptoms?  This seems to be
> > basically the exact same bug as b0b592cf08, just with a different source
> > of unaligned structure members.
> 
> Yes, that is exactly the same bug.  That's how we figured out the solution in fact - it is totally the same problem with another struct member...
> 
> > There's nothing to keep folks from reintroducing these kinds of issues
> > and evidently no way to detect when they happen without lengthy reproducers.
> 
> Correct.  But short of adding asserts / documentation that pointers must be aligned or kmemleak won't work or fixing kmemleak (which I expect is not tractical as it would become a lot slower if nothing else) not sure what else can be done.
> 
> Given I cannot see any alternative to fixing the kmemleak failures I think it is worth applying this fix.
> 
> Unless you have better ideas how to fix this issue?
> 
> What I can say is that we run a lot of tests with our CI and applying 
> this fix we do not see any kmemleak issues any more whilst without it we 
> see hundreds of the above - from a single, simple test run consisting of 
> 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got 
> 20 failures due to this which is quite a lot.  With this fix applied we 
> get zero kmemleak related failures.

I turned this tidbit into the following paragraph in the commit:

    Testing:
    
    We run a lot of tests with our CI, and after applying this fix we do not
    see any kmemleak issues any more whilst without it we see hundreds of
    the above report. From a single, simple test run consisting of 416 individual test
    cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this,
    which is quite a lot. With this fix applied we get zero kmemleak related failures.

Describing the impact of a fix in a changelog is always helpful.

Thanks,

	Ingo
Anton Altaparmakov March 22, 2024, 10:13 a.m. UTC | #6
Hi Ingo,

> On 22 Mar 2024, at 10:03, Ingo Molnar <mingo@kernel.org> wrote:
> * Anton Altaparmakov <anton@tuxera.com> wrote:
>> Hi Dave,
>>> On 14 Mar 2024, at 15:05, Dave Hansen <dave.hansen@intel.com> wrote:
>>> On 3/14/24 07:26, Anton Altaparmakov wrote:
>>>> /* image of the saved processor state */
>>>> struct saved_context {
>>>> - /*
>>>> - * On x86_32, all segment registers except gs are saved at kernel
>>>> - * entry in pt_regs.
>>>> - */
>>>> - u16 gs;
>>>> unsigned long cr0, cr2, cr3, cr4;
>>>> u64 misc_enable;
>>>> struct saved_msrs saved_msrs;
>>>> @@ -27,6 +22,11 @@ struct saved_context {
>>>> unsigned long tr;
>>>> unsigned long safety;
>>>> unsigned long return_address;
>>>> + /*
>>>> + * On x86_32, all segment registers except gs are saved at kernel
>>>> + * entry in pt_regs.
>>>> + */
>>>> + u16 gs;
>>>> bool misc_enable_saved;
>>>> } __attribute__((packed));
>>> 
>>> Isn't this just kinda poking at the symptoms?  This seems to be
>>> basically the exact same bug as b0b592cf08, just with a different source
>>> of unaligned structure members.
>> 
>> Yes, that is exactly the same bug.  That's how we figured out the solution in fact - it is totally the same problem with another struct member...
>> 
>>> There's nothing to keep folks from reintroducing these kinds of issues
>>> and evidently no way to detect when they happen without lengthy reproducers.
>> 
>> Correct.  But short of adding asserts / documentation that pointers must be aligned or kmemleak won't work or fixing kmemleak (which I expect is not tractical as it would become a lot slower if nothing else) not sure what else can be done.
>> 
>> Given I cannot see any alternative to fixing the kmemleak failures I think it is worth applying this fix.
>> 
>> Unless you have better ideas how to fix this issue?
>> 
>> What I can say is that we run a lot of tests with our CI and applying 
>> this fix we do not see any kmemleak issues any more whilst without it we 
>> see hundreds of the above - from a single, simple test run consisting of 
>> 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got 
>> 20 failures due to this which is quite a lot.  With this fix applied we 
>> get zero kmemleak related failures.
> 
> I turned this tidbit into the following paragraph in the commit:
> 
>    Testing:
> 
>    We run a lot of tests with our CI, and after applying this fix we do not
>    see any kmemleak issues any more whilst without it we see hundreds of
>    the above report. From a single, simple test run consisting of 416 individual test
>    cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this,
>    which is quite a lot. With this fix applied we get zero kmemleak related failures.
> 
> Describing the impact of a fix in a changelog is always helpful.

That's a good idea, thank you!  Also, thank you for taking the patch.  Always nice not to have to maintain too many custom kernel patches!

Best regards,

Anton

> 
> Thanks,
> 
> Ingo
diff mbox series

Patch

diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
index a800abb1a992..d8416b3bf832 100644
--- a/arch/x86/include/asm/suspend_32.h
+++ b/arch/x86/include/asm/suspend_32.h
@@ -12,11 +12,6 @@ 
 
 /* image of the saved processor state */
 struct saved_context {
-	/*
-	 * On x86_32, all segment registers except gs are saved at kernel
-	 * entry in pt_regs.
-	 */
-	u16 gs;
 	unsigned long cr0, cr2, cr3, cr4;
 	u64 misc_enable;
 	struct saved_msrs saved_msrs;
@@ -27,6 +22,11 @@  struct saved_context {
 	unsigned long tr;
 	unsigned long safety;
 	unsigned long return_address;
+	/*
+	 * On x86_32, all segment registers except gs are saved at kernel
+	 * entry in pt_regs.
+	 */
+	u16 gs;
 	bool misc_enable_saved;
 } __attribute__((packed));