Message ID | 20240314142656.17699-1-anton@tuxera.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/pm: Fix false positive kmemleak report in msr_build_context(). | expand |
On 3/14/24 07:26, Anton Altaparmakov wrote: > /* image of the saved processor state */ > struct saved_context { > - /* > - * On x86_32, all segment registers except gs are saved at kernel > - * entry in pt_regs. > - */ > - u16 gs; > unsigned long cr0, cr2, cr3, cr4; > u64 misc_enable; > struct saved_msrs saved_msrs; > @@ -27,6 +22,11 @@ struct saved_context { > unsigned long tr; > unsigned long safety; > unsigned long return_address; > + /* > + * On x86_32, all segment registers except gs are saved at kernel > + * entry in pt_regs. > + */ > + u16 gs; > bool misc_enable_saved; > } __attribute__((packed)); Isn't this just kinda poking at the symptoms? This seems to be basically the exact same bug as b0b592cf08, just with a different source of unaligned structure members. There's nothing to keep folks from reintroducing these kinds of issues and evidently no way to detect when they happen without lengthy reproducers.
Hi Dave, > On 14 Mar 2024, at 15:05, Dave Hansen <dave.hansen@intel.com> wrote: > > On 3/14/24 07:26, Anton Altaparmakov wrote: >> /* image of the saved processor state */ >> struct saved_context { >> - /* >> - * On x86_32, all segment registers except gs are saved at kernel >> - * entry in pt_regs. >> - */ >> - u16 gs; >> unsigned long cr0, cr2, cr3, cr4; >> u64 misc_enable; >> struct saved_msrs saved_msrs; >> @@ -27,6 +22,11 @@ struct saved_context { >> unsigned long tr; >> unsigned long safety; >> unsigned long return_address; >> + /* >> + * On x86_32, all segment registers except gs are saved at kernel >> + * entry in pt_regs. >> + */ >> + u16 gs; >> bool misc_enable_saved; >> } __attribute__((packed)); > > Isn't this just kinda poking at the symptoms? This seems to be > basically the exact same bug as b0b592cf08, just with a different source > of unaligned structure members. Yes, that is exactly the same bug. That's how we figured out the solution in fact - it is totally the same problem with another struct member... > There's nothing to keep folks from reintroducing these kinds of issues > and evidently no way to detect when they happen without lengthy reproducers. Correct. But short of adding asserts / documentation that pointers must be aligned or kmemleak won't work or fixing kmemleak (which I expect is not tractical as it would become a lot slower if nothing else) not sure what else can be done. Given I cannot see any alternative to fixing the kmemleak failures I think it is worth applying this fix. Unless you have better ideas how to fix this issue? What I can say is that we run a lot of tests with our CI and applying this fix we do not see any kmemleak issues any more whilst without it we see hundreds of the above - from a single, simple test run consisting of 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this which is quite a lot. With this fix applied we get zero kmemleak related failures. Best regards, Anton
On Thu, Mar 14, 2024 at 4:05 PM Dave Hansen <dave.hansen@intel.com> wrote: > > On 3/14/24 07:26, Anton Altaparmakov wrote: > > /* image of the saved processor state */ > > struct saved_context { > > - /* > > - * On x86_32, all segment registers except gs are saved at kernel > > - * entry in pt_regs. > > - */ > > - u16 gs; > > unsigned long cr0, cr2, cr3, cr4; > > u64 misc_enable; > > struct saved_msrs saved_msrs; > > @@ -27,6 +22,11 @@ struct saved_context { > > unsigned long tr; > > unsigned long safety; > > unsigned long return_address; > > + /* > > + * On x86_32, all segment registers except gs are saved at kernel > > + * entry in pt_regs. > > + */ > > + u16 gs; > > bool misc_enable_saved; > > } __attribute__((packed)); > > Isn't this just kinda poking at the symptoms? This seems to be > basically the exact same bug as b0b592cf08, just with a different source > of unaligned structure members. > > There's nothing to keep folks from reintroducing these kinds of issues > and evidently no way to detect when they happen without lengthy reproducers. This change is fine with me FWIW, but I agree that making it for kmemleak reasons feels kind of misguided.
* Rafael J. Wysocki <rafael@kernel.org> wrote: > On Thu, Mar 14, 2024 at 4:05 PM Dave Hansen <dave.hansen@intel.com> wrote: > > > > On 3/14/24 07:26, Anton Altaparmakov wrote: > > > /* image of the saved processor state */ > > > struct saved_context { > > > - /* > > > - * On x86_32, all segment registers except gs are saved at kernel > > > - * entry in pt_regs. > > > - */ > > > - u16 gs; > > > unsigned long cr0, cr2, cr3, cr4; > > > u64 misc_enable; > > > struct saved_msrs saved_msrs; > > > @@ -27,6 +22,11 @@ struct saved_context { > > > unsigned long tr; > > > unsigned long safety; > > > unsigned long return_address; > > > + /* > > > + * On x86_32, all segment registers except gs are saved at kernel > > > + * entry in pt_regs. > > > + */ > > > + u16 gs; > > > bool misc_enable_saved; > > > } __attribute__((packed)); > > > > Isn't this just kinda poking at the symptoms? This seems to be > > basically the exact same bug as b0b592cf08, just with a different source > > of unaligned structure members. > > > > There's nothing to keep folks from reintroducing these kinds of issues > > and evidently no way to detect when they happen without lengthy reproducers. > > This change is fine with me FWIW, thx, I've added your: Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> > but I agree that making it for kmemleak reasons feels kind of misguided. Yeah, so it's a workaround, but kmemleak is also a useful debugging facility that is finding memory leaks that static checkers are missing. The fact that we don't have an easy way to prevent these problems from being introduced is I think properly counterbalanced by the facts that: 1) Only kmemleak users are inconvenienced by the false positives. 2) kmemleak users & maintainers have created the patch. There was no pressure on us x86 maintainers other than to apply a root-cause analyzed patch. 2) Over a timespan of ~10 years only 2 such alignment problems were introduced, and they were fixed by the kmemleak folks. I think that's a fair price to pay for a useful facility. Ie. I don't think there's any long-term maintenance burder concern. So I've applied this workaround to x86/urgent, with a change to the title to make sure this isn't understood as a real bug in the PM code, but a workaround: 37fb408c99af x86/pm: Work around false positive kmemleak report in msr_build_context() ... lemme know if you feel strongly about this. :-) Thanks, Ingo
* Anton Altaparmakov <anton@tuxera.com> wrote: > Hi Dave, > > > On 14 Mar 2024, at 15:05, Dave Hansen <dave.hansen@intel.com> wrote: > > > > On 3/14/24 07:26, Anton Altaparmakov wrote: > >> /* image of the saved processor state */ > >> struct saved_context { > >> - /* > >> - * On x86_32, all segment registers except gs are saved at kernel > >> - * entry in pt_regs. > >> - */ > >> - u16 gs; > >> unsigned long cr0, cr2, cr3, cr4; > >> u64 misc_enable; > >> struct saved_msrs saved_msrs; > >> @@ -27,6 +22,11 @@ struct saved_context { > >> unsigned long tr; > >> unsigned long safety; > >> unsigned long return_address; > >> + /* > >> + * On x86_32, all segment registers except gs are saved at kernel > >> + * entry in pt_regs. > >> + */ > >> + u16 gs; > >> bool misc_enable_saved; > >> } __attribute__((packed)); > > > > Isn't this just kinda poking at the symptoms? This seems to be > > basically the exact same bug as b0b592cf08, just with a different source > > of unaligned structure members. > > Yes, that is exactly the same bug. That's how we figured out the solution in fact - it is totally the same problem with another struct member... > > > There's nothing to keep folks from reintroducing these kinds of issues > > and evidently no way to detect when they happen without lengthy reproducers. > > Correct. But short of adding asserts / documentation that pointers must be aligned or kmemleak won't work or fixing kmemleak (which I expect is not tractical as it would become a lot slower if nothing else) not sure what else can be done. > > Given I cannot see any alternative to fixing the kmemleak failures I think it is worth applying this fix. > > Unless you have better ideas how to fix this issue? > > What I can say is that we run a lot of tests with our CI and applying > this fix we do not see any kmemleak issues any more whilst without it we > see hundreds of the above - from a single, simple test run consisting of > 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got > 20 failures due to this which is quite a lot. With this fix applied we > get zero kmemleak related failures. I turned this tidbit into the following paragraph in the commit: Testing: We run a lot of tests with our CI, and after applying this fix we do not see any kmemleak issues any more whilst without it we see hundreds of the above report. From a single, simple test run consisting of 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this, which is quite a lot. With this fix applied we get zero kmemleak related failures. Describing the impact of a fix in a changelog is always helpful. Thanks, Ingo
Hi Ingo, On 22 Mar 2024, at 10:03, Ingo Molnar <mingo@kernel.org> wrote: * Anton Altaparmakov <anton@tuxera.com<mailto:anton@tuxera.com>> wrote: Hi Dave, On 14 Mar 2024, at 15:05, Dave Hansen <dave.hansen@intel.com> wrote: On 3/14/24 07:26, Anton Altaparmakov wrote: /* image of the saved processor state */ struct saved_context { - /* - * On x86_32, all segment registers except gs are saved at kernel - * entry in pt_regs. - */ - u16 gs; unsigned long cr0, cr2, cr3, cr4; u64 misc_enable; struct saved_msrs saved_msrs; @@ -27,6 +22,11 @@ struct saved_context { unsigned long tr; unsigned long safety; unsigned long return_address; + /* + * On x86_32, all segment registers except gs are saved at kernel + * entry in pt_regs. + */ + u16 gs; bool misc_enable_saved; } __attribute__((packed)); Isn't this just kinda poking at the symptoms? This seems to be basically the exact same bug as b0b592cf08, just with a different source of unaligned structure members. Yes, that is exactly the same bug. That's how we figured out the solution in fact - it is totally the same problem with another struct member... There's nothing to keep folks from reintroducing these kinds of issues and evidently no way to detect when they happen without lengthy reproducers. Correct. But short of adding asserts / documentation that pointers must be aligned or kmemleak won't work or fixing kmemleak (which I expect is not tractical as it would become a lot slower if nothing else) not sure what else can be done. Given I cannot see any alternative to fixing the kmemleak failures I think it is worth applying this fix. Unless you have better ideas how to fix this issue? What I can say is that we run a lot of tests with our CI and applying this fix we do not see any kmemleak issues any more whilst without it we see hundreds of the above - from a single, simple test run consisting of 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this which is quite a lot. With this fix applied we get zero kmemleak related failures. I turned this tidbit into the following paragraph in the commit: Testing: We run a lot of tests with our CI, and after applying this fix we do not see any kmemleak issues any more whilst without it we see hundreds of the above report. From a single, simple test run consisting of 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this, which is quite a lot. With this fix applied we get zero kmemleak related failures. Describing the impact of a fix in a changelog is always helpful. That's a good idea, thank you! Also, thank you for taking the patch. Always nice not to have to maintain too many custom kernel patches! Best regards, Anton Thanks, Ingo -- Anton Altaparmakov <anton at tuxera.com> (replace at with @) Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
Hi Ingo, > On 22 Mar 2024, at 10:03, Ingo Molnar <mingo@kernel.org> wrote: > * Anton Altaparmakov <anton@tuxera.com> wrote: >> Hi Dave, >>> On 14 Mar 2024, at 15:05, Dave Hansen <dave.hansen@intel.com> wrote: >>> On 3/14/24 07:26, Anton Altaparmakov wrote: >>>> /* image of the saved processor state */ >>>> struct saved_context { >>>> - /* >>>> - * On x86_32, all segment registers except gs are saved at kernel >>>> - * entry in pt_regs. >>>> - */ >>>> - u16 gs; >>>> unsigned long cr0, cr2, cr3, cr4; >>>> u64 misc_enable; >>>> struct saved_msrs saved_msrs; >>>> @@ -27,6 +22,11 @@ struct saved_context { >>>> unsigned long tr; >>>> unsigned long safety; >>>> unsigned long return_address; >>>> + /* >>>> + * On x86_32, all segment registers except gs are saved at kernel >>>> + * entry in pt_regs. >>>> + */ >>>> + u16 gs; >>>> bool misc_enable_saved; >>>> } __attribute__((packed)); >>> >>> Isn't this just kinda poking at the symptoms? This seems to be >>> basically the exact same bug as b0b592cf08, just with a different source >>> of unaligned structure members. >> >> Yes, that is exactly the same bug. That's how we figured out the solution in fact - it is totally the same problem with another struct member... >> >>> There's nothing to keep folks from reintroducing these kinds of issues >>> and evidently no way to detect when they happen without lengthy reproducers. >> >> Correct. But short of adding asserts / documentation that pointers must be aligned or kmemleak won't work or fixing kmemleak (which I expect is not tractical as it would become a lot slower if nothing else) not sure what else can be done. >> >> Given I cannot see any alternative to fixing the kmemleak failures I think it is worth applying this fix. >> >> Unless you have better ideas how to fix this issue? >> >> What I can say is that we run a lot of tests with our CI and applying >> this fix we do not see any kmemleak issues any more whilst without it we >> see hundreds of the above - from a single, simple test run consisting of >> 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got >> 20 failures due to this which is quite a lot. With this fix applied we >> get zero kmemleak related failures. > > I turned this tidbit into the following paragraph in the commit: > > Testing: > > We run a lot of tests with our CI, and after applying this fix we do not > see any kmemleak issues any more whilst without it we see hundreds of > the above report. From a single, simple test run consisting of 416 individual test > cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this, > which is quite a lot. With this fix applied we get zero kmemleak related failures. > > Describing the impact of a fix in a changelog is always helpful. That's a good idea, thank you! Also, thank you for taking the patch. Always nice not to have to maintain too many custom kernel patches! Best regards, Anton > > Thanks, > > Ingo
diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h index a800abb1a992..d8416b3bf832 100644 --- a/arch/x86/include/asm/suspend_32.h +++ b/arch/x86/include/asm/suspend_32.h @@ -12,11 +12,6 @@ /* image of the saved processor state */ struct saved_context { - /* - * On x86_32, all segment registers except gs are saved at kernel - * entry in pt_regs. - */ - u16 gs; unsigned long cr0, cr2, cr3, cr4; u64 misc_enable; struct saved_msrs saved_msrs; @@ -27,6 +22,11 @@ struct saved_context { unsigned long tr; unsigned long safety; unsigned long return_address; + /* + * On x86_32, all segment registers except gs are saved at kernel + * entry in pt_regs. + */ + u16 gs; bool misc_enable_saved; } __attribute__((packed));
Since 7ee18d677989 ("x86/power: Make restore_processor_context() sane") kmemleak reports this issue: unreferenced object 0xf68241e0 (size 32): comm "swapper/0", pid 1, jiffies 4294668610 (age 68.432s) hex dump (first 32 bytes): 00 cc cc cc 29 10 01 c0 00 00 00 00 00 00 00 00 ....)........... 00 42 82 f6 cc cc cc cc cc cc cc cc cc cc cc cc .B.............. backtrace: [<461c1d50>] __kmem_cache_alloc_node+0x106/0x260 [<ea65e13b>] __kmalloc+0x54/0x160 [<c3858cd2>] msr_build_context.constprop.0+0x35/0x100 [<46635aff>] pm_check_save_msr+0x63/0x80 [<6b6bb938>] do_one_initcall+0x41/0x1f0 [<3f3add60>] kernel_init_freeable+0x199/0x1e8 [<3b538fde>] kernel_init+0x1a/0x110 [<938ae2b2>] ret_from_fork+0x1c/0x28 Reproducer: - Run rsync of whole kernel tree (multiple times if needed). - start a kmemleak scan - Note this is just an example: a lot of our internal tests hit these. The root cause is we expect the same as the equivalent fix in commit b0b592cf0836, i.e. the alignment within the packed struct saved_context which has everything unaligned as there is only "u16 gs;" at start of struct where in the past there were four u16 there thus aligning everything afterwards. The issue is with the fact that Kmemleak only searches for pointers that are aligned (see how pointers are scanned in kmemleak.c) so when the struct members are not aligned it doesn't see them. Note we have picked this up on 5.4, 6.1 and 6.6 kernels but we expect it is the same on all kernels >= 4.15 as the commit 7ee18d677989 which changed from having four u16 to a single u16 at the start of the struct was introduced in 4.15. Fixes: 7ee18d677989 ("x86/power: Make restore_processor_context() sane") Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Cc: stable@vger.kernel.org --- arch/x86/include/asm/suspend_32.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)