diff mbox series

[v5,03/38] kmsan: gfp: introduce __GFP_NO_KMSAN_SHADOW

Message ID 20200325161249.55095-4-glider@google.com (mailing list archive)
State New, archived
Headers show
Series Add KernelMemorySanitizer infrastructure | expand

Commit Message

Alexander Potapenko March 25, 2020, 4:12 p.m. UTC
This flag is to be used by KMSAN runtime to mark that newly created
memory pages don't need KMSAN metadata backing them.

Signed-off-by: Alexander Potapenko <glider@google.com>
To: Alexander Potapenko <glider@google.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: linux-mm@kvack.org

---
We can't decide what to do here:
 - do we need to conditionally define ___GFP_NO_KMSAN_SHADOW depending on
   CONFIG_KMSAN like LOCKDEP does?
 - if KMSAN is defined, and LOCKDEP is not, do we want to "compactify" the GFP
   bits?

Change-Id: If5d0352fd5711ad103328e2c185eb885e826423a
---
 include/linux/gfp.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Michal Hocko March 25, 2020, 4:19 p.m. UTC | #1
On Wed 25-03-20 17:12:14, glider@google.com wrote:
> This flag is to be used by KMSAN runtime to mark that newly created
> memory pages don't need KMSAN metadata backing them.

I really dislike an idea of the gfp flag. If you need some form of
exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore}
History tells us that single usecase gfp flags are too tempting to abuse
and using incorrectly.

> Signed-off-by: Alexander Potapenko <glider@google.com>
> To: Alexander Potapenko <glider@google.com>
> Cc: Vegard Nossum <vegard.nossum@oracle.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Marco Elver <elver@google.com>
> Cc: Andrey Konovalov <andreyknvl@google.com>
> Cc: linux-mm@kvack.org
> 
> ---
> We can't decide what to do here:
>  - do we need to conditionally define ___GFP_NO_KMSAN_SHADOW depending on
>    CONFIG_KMSAN like LOCKDEP does?
>  - if KMSAN is defined, and LOCKDEP is not, do we want to "compactify" the GFP
>    bits?
> 
> Change-Id: If5d0352fd5711ad103328e2c185eb885e826423a
> ---
>  include/linux/gfp.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index be2754841369e..e1ab42b5e9ce2 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -44,6 +44,7 @@ struct vm_area_struct;
>  #else
>  #define ___GFP_NOLOCKDEP	0
>  #endif
> +#define ___GFP_NO_KMSAN_SHADOW  0x1000000u
>  /* If the above are modified, __GFP_BITS_SHIFT may need updating */
>  
>  /*
> @@ -212,12 +213,13 @@ struct vm_area_struct;
>  #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
>  #define __GFP_COMP	((__force gfp_t)___GFP_COMP)
>  #define __GFP_ZERO	((__force gfp_t)___GFP_ZERO)
> +#define __GFP_NO_KMSAN_SHADOW  ((__force gfp_t)___GFP_NO_KMSAN_SHADOW)
>  
>  /* Disable lockdep for GFP context tracking */
>  #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
>  
>  /* Room for N __GFP_FOO bits */
> -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
> +#define __GFP_BITS_SHIFT (25)
>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>  
>  /**
> -- 
> 2.25.1.696.g5e7596f4ac-goog
>
Alexander Potapenko March 25, 2020, 5:26 p.m. UTC | #2
On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 25-03-20 17:12:14, glider@google.com wrote:
> > This flag is to be used by KMSAN runtime to mark that newly created
> > memory pages don't need KMSAN metadata backing them.
>
> I really dislike an idea of the gfp flag. If you need some form of
> exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore}
> History tells us that single usecase gfp flags are too tempting to abuse
> and using incorrectly.

Great idea, will do!
Guess PF_ flags isn't a scarce resource?
Alexander Potapenko March 25, 2020, 5:40 p.m. UTC | #3
On Wed, Mar 25, 2020 at 6:26 PM Alexander Potapenko <glider@google.com> wrote:
>
> On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Wed 25-03-20 17:12:14, glider@google.com wrote:
> > > This flag is to be used by KMSAN runtime to mark that newly created
> > > memory pages don't need KMSAN metadata backing them.
> >
> > I really dislike an idea of the gfp flag. If you need some form of
> > exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore}
> > History tells us that single usecase gfp flags are too tempting to abuse
> > and using incorrectly.
>
> Great idea, will do!
> Guess PF_ flags isn't a scarce resource?

Actually, no, we are out of bits in current->flags already.
I could introduce a separate flag into struct task, but that won't
work in interrupt contexts - how do you solve that problem for FS/IO
allocations?
Michal Hocko March 25, 2020, 5:43 p.m. UTC | #4
On Wed 25-03-20 18:26:34, Alexander Potapenko wrote:
> On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Wed 25-03-20 17:12:14, glider@google.com wrote:
> > > This flag is to be used by KMSAN runtime to mark that newly created
> > > memory pages don't need KMSAN metadata backing them.
> >
> > I really dislike an idea of the gfp flag. If you need some form of
> > exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore}
> > History tells us that single usecase gfp flags are too tempting to abuse
> > and using incorrectly.
> 
> Great idea, will do!
> Guess PF_ flags isn't a scarce resource?

task_struct is a monster data structure and there are surely holes you
can fit your flag in. All you need is to just store it somewhere and
then check for it wherever you hook your infrastructure into the page
or slab allocator. The primary thing is to avoid a gfp flag.

Thanks!
Matthew Wilcox March 25, 2020, 5:49 p.m. UTC | #5
On Wed, Mar 25, 2020 at 06:40:29PM +0100, Alexander Potapenko wrote:
> On Wed, Mar 25, 2020 at 6:26 PM Alexander Potapenko <glider@google.com> wrote:
> >
> > On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Wed 25-03-20 17:12:14, glider@google.com wrote:
> > > > This flag is to be used by KMSAN runtime to mark that newly created
> > > > memory pages don't need KMSAN metadata backing them.
> > >
> > > I really dislike an idea of the gfp flag. If you need some form of
> > > exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore}
> > > History tells us that single usecase gfp flags are too tempting to abuse
> > > and using incorrectly.
> >
> > Great idea, will do!
> > Guess PF_ flags isn't a scarce resource?
> 
> Actually, no, we are out of bits in current->flags already.
> I could introduce a separate flag into struct task, but that won't
> work in interrupt contexts - how do you solve that problem for FS/IO
> allocations?

I would suggest using bits in the section labelled:

        /* Unserialized, strictly 'current' */

since this doesn't need to be accessed from any other task.  Michal,
can we move PF_MEMALLOC_NOIO and NOFS to that area too?
Alexander Potapenko March 25, 2020, 6:03 p.m. UTC | #6
> I would suggest using bits in the section labelled:
>
>         /* Unserialized, strictly 'current' */

The main problem is that |current| is unavailable in the interrupt
context, so we'll also need to:
 - disable interrupts when preparing for a KMSAN internal memory
allocation - sounds costly, huh?
 - store the context flag in a per-cpu variable in the case |current|
is unavailable.
Matthew Wilcox March 25, 2020, 6:09 p.m. UTC | #7
On Wed, Mar 25, 2020 at 07:03:32PM +0100, Alexander Potapenko wrote:
> > I would suggest using bits in the section labelled:
> >
> >         /* Unserialized, strictly 'current' */
> 
> The main problem is that |current| is unavailable in the interrupt
> context, so we'll also need to:
>  - disable interrupts when preparing for a KMSAN internal memory
> allocation - sounds costly, huh?
>  - store the context flag in a per-cpu variable in the case |current|
> is unavailable.

It's not /unavailable/ ... it's whatever task happens to be running at
the time the interrupt is triggered.  You can borrow its task_struct.
You'll have to save off the current value of the flag before setting it,
just like memalloc_nofs_save() does.

But this does rather call into question whether Michal's advice to use
task_struct is good advice to begin with.  For memalloc_nofs/noio,
it works well this way because allocations in interrupt context are
inherently at a more restrictive context than task level.  It's not clear
to me what this kmsan GFP flag is being used for, and whether allocations
that happen in interrupt context should inherit the kmsan setting.
I will have to read these patches more carefully to determine that;
I was really just responding to the "where can I find some free bits"
part of the question.
Alexander Potapenko March 25, 2020, 6:30 p.m. UTC | #8
On Wed, Mar 25, 2020 at 7:09 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Mar 25, 2020 at 07:03:32PM +0100, Alexander Potapenko wrote:
> > > I would suggest using bits in the section labelled:
> > >
> > >         /* Unserialized, strictly 'current' */
> >
> > The main problem is that |current| is unavailable in the interrupt
> > context, so we'll also need to:
> >  - disable interrupts when preparing for a KMSAN internal memory
> > allocation - sounds costly, huh?
> >  - store the context flag in a per-cpu variable in the case |current|
> > is unavailable.
>
> It's not /unavailable/ ... it's whatever task happens to be running at
> the time the interrupt is triggered.  You can borrow its task_struct.

That didn't come to my mind, interesting!

> You'll have to save off the current value of the flag before setting it,
> just like memalloc_nofs_save() does.
> But this does rather call into question whether Michal's advice to use
> task_struct is good advice to begin with.  For memalloc_nofs/noio,
> it works well this way because allocations in interrupt context are
> inherently at a more restrictive context than task level.  It's not clear
> to me what this kmsan GFP flag is being used for, and whether allocations
> that happen in interrupt context should inherit the kmsan setting.

The idea behind this flag is as follows.
KMSAN must allocate metadata pages for every allocation performed by
alloc_pages().
Metadata allocations are also done via alloc_pages(), and for those no
further metadata needs to be allocated.
Thus the GFP flag is used to prevent the recursion in alloc_pages().

It is theoretically possible that a less restrictive allocation
(without __GFP_NO_KMSAN_SHADOW) happens in an interrupt on top of a
task performing a more restrictive allocation (with
__GFP_NO_KMSAN_SHADOW).

> I will have to read these patches more carefully to determine that;
> I was really just responding to the "where can I find some free bits"
> part of the question.

Thanks for clarification.
Michal Hocko March 25, 2020, 6:38 p.m. UTC | #9
On Wed 25-03-20 18:40:29, Alexander Potapenko wrote:
> On Wed, Mar 25, 2020 at 6:26 PM Alexander Potapenko <glider@google.com> wrote:
> >
> > On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Wed 25-03-20 17:12:14, glider@google.com wrote:
> > > > This flag is to be used by KMSAN runtime to mark that newly created
> > > > memory pages don't need KMSAN metadata backing them.
> > >
> > > I really dislike an idea of the gfp flag. If you need some form of
> > > exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore}
> > > History tells us that single usecase gfp flags are too tempting to abuse
> > > and using incorrectly.
> >
> > Great idea, will do!
> > Guess PF_ flags isn't a scarce resource?
> 
> Actually, no, we are out of bits in current->flags already.
> I could introduce a separate flag into struct task, but that won't
> work in interrupt contexts - how do you solve that problem for FS/IO
> allocations?

NOFS/NOIO is not a problem for IRQ context because we never do reclaim
from that context.

I was also not aware that there are users from the IRQ context. I
thought this would be an internal KMSAN stuff. What would be the IRQ
context you call this from?

Anyway, if you cannot go with the task_struct then a per-cpu value
should work, right?
Michal Hocko March 25, 2020, 6:40 p.m. UTC | #10
On Wed 25-03-20 10:49:16, Matthew Wilcox wrote:
> On Wed, Mar 25, 2020 at 06:40:29PM +0100, Alexander Potapenko wrote:
> > On Wed, Mar 25, 2020 at 6:26 PM Alexander Potapenko <glider@google.com> wrote:
> > >
> > > On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Wed 25-03-20 17:12:14, glider@google.com wrote:
> > > > > This flag is to be used by KMSAN runtime to mark that newly created
> > > > > memory pages don't need KMSAN metadata backing them.
> > > >
> > > > I really dislike an idea of the gfp flag. If you need some form of
> > > > exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore}
> > > > History tells us that single usecase gfp flags are too tempting to abuse
> > > > and using incorrectly.
> > >
> > > Great idea, will do!
> > > Guess PF_ flags isn't a scarce resource?
> > 
> > Actually, no, we are out of bits in current->flags already.
> > I could introduce a separate flag into struct task, but that won't
> > work in interrupt contexts - how do you solve that problem for FS/IO
> > allocations?
> 
> I would suggest using bits in the section labelled:
> 
>         /* Unserialized, strictly 'current' */
> 
> since this doesn't need to be accessed from any other task.  Michal,
> can we move PF_MEMALLOC_NOIO and NOFS to that area too?

I wouldn't object. The only reason for using PF_$FOO is historical. It
was XFS which started to use pf flag for NOSF AFAIR. I haven't checked
recently but xfs still used to use its own api in the past.
Michal Hocko March 25, 2020, 6:43 p.m. UTC | #11
On Wed 25-03-20 19:30:09, Alexander Potapenko wrote:
> On Wed, Mar 25, 2020 at 7:09 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Mar 25, 2020 at 07:03:32PM +0100, Alexander Potapenko wrote:
> > > > I would suggest using bits in the section labelled:
> > > >
> > > >         /* Unserialized, strictly 'current' */
> > >
> > > The main problem is that |current| is unavailable in the interrupt
> > > context, so we'll also need to:
> > >  - disable interrupts when preparing for a KMSAN internal memory
> > > allocation - sounds costly, huh?
> > >  - store the context flag in a per-cpu variable in the case |current|
> > > is unavailable.
> >
> > It's not /unavailable/ ... it's whatever task happens to be running at
> > the time the interrupt is triggered.  You can borrow its task_struct.
> 
> That didn't come to my mind, interesting!
> 
> > You'll have to save off the current value of the flag before setting it,
> > just like memalloc_nofs_save() does.
> > But this does rather call into question whether Michal's advice to use
> > task_struct is good advice to begin with.  For memalloc_nofs/noio,
> > it works well this way because allocations in interrupt context are
> > inherently at a more restrictive context than task level.  It's not clear
> > to me what this kmsan GFP flag is being used for, and whether allocations
> > that happen in interrupt context should inherit the kmsan setting.
> 
> The idea behind this flag is as follows.
> KMSAN must allocate metadata pages for every allocation performed by
> alloc_pages().
> Metadata allocations are also done via alloc_pages(), and for those no
> further metadata needs to be allocated.
> Thus the GFP flag is used to prevent the recursion in alloc_pages().

Kmemleak used the same approach IIRC. It turned out to be unusable in a
presence of any higher GFP_NOWAIT memory pressure. Anyway talk to
Catalin Marinas about problems he had to go through to address them.
Alexander Potapenko March 27, 2020, 12:20 p.m. UTC | #12
On Wed, Mar 25, 2020 at 7:38 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 25-03-20 18:40:29, Alexander Potapenko wrote:
> > On Wed, Mar 25, 2020 at 6:26 PM Alexander Potapenko <glider@google.com> wrote:
> > >
> > > On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Wed 25-03-20 17:12:14, glider@google.com wrote:
> > > > > This flag is to be used by KMSAN runtime to mark that newly created
> > > > > memory pages don't need KMSAN metadata backing them.
> > > >
> > > > I really dislike an idea of the gfp flag. If you need some form of
> > > > exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore}
> > > > History tells us that single usecase gfp flags are too tempting to abuse
> > > > and using incorrectly.
> > >
> > > Great idea, will do!
> > > Guess PF_ flags isn't a scarce resource?
> >
> > Actually, no, we are out of bits in current->flags already.
> > I could introduce a separate flag into struct task, but that won't
> > work in interrupt contexts - how do you solve that problem for FS/IO
> > allocations?
>
> NOFS/NOIO is not a problem for IRQ context because we never do reclaim
> from that context.
>
> I was also not aware that there are users from the IRQ context. I
> thought this would be an internal KMSAN stuff. What would be the IRQ
> context you call this from?

KMSAN allocates its metadata lazily*, so if any code does
alloc_pages() from IRQ context, we need to call alloc_pages two more
times for shadow/origin pages.
We also unwind the stack on every creation/copy of an uninitialized
value. Those stacks are stored in the stack depot, which may also
allocate new pages to store stacks.

> Anyway, if you cannot go with the task_struct then a per-cpu value
> should work, right?

Yes, I will try that.

* - as mentioned in the cover letter, a lot of problems could've been
solved if we pre-allocate the metadata pages at boot time so that for
every two contiguous physical pages their metadata pages are also
contiguous.
I haven't come up with a good idea about how to implement that.
Suggestions are very welcome.
--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Alexander Potapenko April 25, 2020, 9:45 a.m. UTC | #13
> * - as mentioned in the cover letter, a lot of problems could've been
> solved if we pre-allocate the metadata pages at boot time so that for
> every two contiguous physical pages their metadata pages are also
> contiguous.
> I haven't come up with a good idea about how to implement that.
> Suggestions are very welcome.

I think I have a working solution now that doesn't require
__GFP_NO_KMSAN_SHADOW. I'll drop this patch from the series.
diff mbox series

Patch

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index be2754841369e..e1ab42b5e9ce2 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -44,6 +44,7 @@  struct vm_area_struct;
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
+#define ___GFP_NO_KMSAN_SHADOW  0x1000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -212,12 +213,13 @@  struct vm_area_struct;
 #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
 #define __GFP_COMP	((__force gfp_t)___GFP_COMP)
 #define __GFP_ZERO	((__force gfp_t)___GFP_ZERO)
+#define __GFP_NO_KMSAN_SHADOW  ((__force gfp_t)___GFP_NO_KMSAN_SHADOW)
 
 /* Disable lockdep for GFP context tracking */
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (25)
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /**