diff mbox

mm: drop VM_BUG_ON from __get_free_pages

Message ID 20180622162841.25114-1-mhocko@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Michal Hocko June 22, 2018, 4:28 p.m. UTC
From: Michal Hocko <mhocko@suse.com>

There is no real reason to blow up just because the caller doesn't know
that __get_free_pages cannot return highmem pages. Simply fix that up
silently. Even if we have some confused users such a fixup will not be
harmful.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
Hi Andrew,
previously posted [1] but it fell through cracks. Can we merge it now?

[1] http://lkml.kernel.org/r/20171129160446.jluzpv3n6mjc3fwv@dhcp22.suse.cz

 mm/page_alloc.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

Comments

Vlastimil Babka June 26, 2018, 1:57 p.m. UTC | #1
On 06/22/2018 06:28 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> There is no real reason to blow up just because the caller doesn't know
> that __get_free_pages cannot return highmem pages. Simply fix that up
> silently. Even if we have some confused users such a fixup will not be
> harmful.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> Hi Andrew,
> previously posted [1] but it fell through cracks. Can we merge it now?
> 
> [1] http://lkml.kernel.org/r/20171129160446.jluzpv3n6mjc3fwv@dhcp22.suse.cz
> 
>  mm/page_alloc.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1521100f1e63..5f56f662a52d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4402,18 +4402,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>  EXPORT_SYMBOL(__alloc_pages_nodemask);
>  
>  /*
> - * Common helper functions.
> + * Common helper functions. Never use with __GFP_HIGHMEM because the returned
> + * address cannot represent highmem pages. Use alloc_pages and then kmap if
> + * you need to access high mem.
>   */
>  unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order)
>  {
>  	struct page *page;
>  
> -	/*
> -	 * __get_free_pages() returns a virtual address, which cannot represent
> -	 * a highmem page
> -	 */
> -	VM_BUG_ON((gfp_mask & __GFP_HIGHMEM) != 0);
> -
>  	page = alloc_pages(gfp_mask, order);

The previous version had also replaced the line above with:

+	page = alloc_pages(gfp_mask & ~__GFP_HIGHMEM, order);

This one doesn't, yet you say "fix that up silently". Bug?

>  	if (!page)
>  		return 0;
>
Michal Hocko June 26, 2018, 2:44 p.m. UTC | #2
On Tue 26-06-18 15:57:39, Vlastimil Babka wrote:
> On 06/22/2018 06:28 PM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > There is no real reason to blow up just because the caller doesn't know
> > that __get_free_pages cannot return highmem pages. Simply fix that up
> > silently. Even if we have some confused users such a fixup will not be
> > harmful.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> > Hi Andrew,
> > previously posted [1] but it fell through cracks. Can we merge it now?
> > 
> > [1] http://lkml.kernel.org/r/20171129160446.jluzpv3n6mjc3fwv@dhcp22.suse.cz
> > 
> >  mm/page_alloc.c | 10 +++-------
> >  1 file changed, 3 insertions(+), 7 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 1521100f1e63..5f56f662a52d 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4402,18 +4402,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> >  EXPORT_SYMBOL(__alloc_pages_nodemask);
> >  
> >  /*
> > - * Common helper functions.
> > + * Common helper functions. Never use with __GFP_HIGHMEM because the returned
> > + * address cannot represent highmem pages. Use alloc_pages and then kmap if
> > + * you need to access high mem.
> >   */
> >  unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order)
> >  {
> >  	struct page *page;
> >  
> > -	/*
> > -	 * __get_free_pages() returns a virtual address, which cannot represent
> > -	 * a highmem page
> > -	 */
> > -	VM_BUG_ON((gfp_mask & __GFP_HIGHMEM) != 0);
> > -
> >  	page = alloc_pages(gfp_mask, order);
> 
> The previous version had also replaced the line above with:
> 
> +	page = alloc_pages(gfp_mask & ~__GFP_HIGHMEM, order);
> 
> This one doesn't, yet you say "fix that up silently". Bug?

got lost somewhere on the way during the discussion. Thanks for spotting
that.

Andrew, could you add gfp_mask & ~__GFP_HIGHMEM please? Or should I
resubmit?
Andrew Morton June 26, 2018, 5:04 p.m. UTC | #3
On Tue, 26 Jun 2018 15:57:39 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:

> On 06/22/2018 06:28 PM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > There is no real reason to blow up just because the caller doesn't know
> > that __get_free_pages cannot return highmem pages. Simply fix that up
> > silently. Even if we have some confused users such a fixup will not be
> > harmful.
> > 
>
> ...
>
> >  /*
> > - * Common helper functions.
> > + * Common helper functions. Never use with __GFP_HIGHMEM because the returned
> > + * address cannot represent highmem pages. Use alloc_pages and then kmap if
> > + * you need to access high mem.
> >   */
> >  unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order)
> >  {
> >  	struct page *page;
> >  
> > -	/*
> > -	 * __get_free_pages() returns a virtual address, which cannot represent
> > -	 * a highmem page
> > -	 */
> > -	VM_BUG_ON((gfp_mask & __GFP_HIGHMEM) != 0);
> > -
> >  	page = alloc_pages(gfp_mask, order);
> 
> The previous version had also replaced the line above with:
> 
> +	page = alloc_pages(gfp_mask & ~__GFP_HIGHMEM, order);
> 
> This one doesn't, yet you say "fix that up silently". Bug?
> 

This reminds me what is irritating about the patch.  We're adding
additional code to a somewhat fast path to handle something which we
know never happens, thanks to the now-removed check.

This newly-added code might become functional in the future, if people
add incorrect callers.  Callers whose incorrectness would have been
revealed by the now-removed check!

So.. argh.

Really, the changelog isn't right.  There *is* a real reason to blow
up.  Effectively the caller is attempting to obtain the virtual address
of a highmem page without having kmapped it first.  That's an outright
bug.


An alternative might be to just accept the bogus __GFP_HIGHMEM, let
page_to_virt() return a crap address and wait for the user bug reports
to come in when someone tries to run the offending code on a highmem
machine.  That shouldn't take too long - the page allocator will prefer
to return a highmem page in this case.

And adding a rule to the various static checkers should catch most
offenders.

Or just leave the ode as it is now.
Michal Hocko June 27, 2018, 7:34 a.m. UTC | #4
On Tue 26-06-18 10:04:16, Andrew Morton wrote:
> On Tue, 26 Jun 2018 15:57:39 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:
> 
> > On 06/22/2018 06:28 PM, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > There is no real reason to blow up just because the caller doesn't know
> > > that __get_free_pages cannot return highmem pages. Simply fix that up
> > > silently. Even if we have some confused users such a fixup will not be
> > > harmful.
> > > 
> >
> > ...
> >
> > >  /*
> > > - * Common helper functions.
> > > + * Common helper functions. Never use with __GFP_HIGHMEM because the returned
> > > + * address cannot represent highmem pages. Use alloc_pages and then kmap if
> > > + * you need to access high mem.
> > >   */
> > >  unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order)
> > >  {
> > >  	struct page *page;
> > >  
> > > -	/*
> > > -	 * __get_free_pages() returns a virtual address, which cannot represent
> > > -	 * a highmem page
> > > -	 */
> > > -	VM_BUG_ON((gfp_mask & __GFP_HIGHMEM) != 0);
> > > -
> > >  	page = alloc_pages(gfp_mask, order);
> > 
> > The previous version had also replaced the line above with:
> > 
> > +	page = alloc_pages(gfp_mask & ~__GFP_HIGHMEM, order);
> > 
> > This one doesn't, yet you say "fix that up silently". Bug?
> > 
> 
> This reminds me what is irritating about the patch.  We're adding
> additional code to a somewhat fast path to handle something which we
> know never happens, thanks to the now-removed check.
> 
> This newly-added code might become functional in the future, if people
> add incorrect callers.  Callers whose incorrectness would have been
> revealed by the now-removed check!

That check depends on a debugging config option which is not enabled all
the time so how does this help in most production systems? More over it
is "blow up on incorrect use" kind of check. I am pretty sure Linus
would have some word about such error handling...

> So.. argh.
> 
> Really, the changelog isn't right.  There *is* a real reason to blow
> up.  Effectively the caller is attempting to obtain the virtual address
> of a highmem page without having kmapped it first.  That's an outright
> bug.

And as I've argued before the code would be wrong regardless. We would
leak the memory or worse touch somebody's else kmap without knowing
that.  So we have a choice between a mem leak, data corruption k or a
silent fixup. I would prefer the last option. And blowing up on a BUG
is not much better on something that is easily fixable. I am not really
convinced that & ~__GFP_HIGHMEM is something to lose sleep over.
 
> An alternative might be to just accept the bogus __GFP_HIGHMEM, let
> page_to_virt() return a crap address and wait for the user bug reports
> to come in when someone tries to run the offending code on a highmem
> machine.  That shouldn't take too long - the page allocator will prefer
> to return a highmem page in this case.

Well, I would be concerned about page_address returning an active kmap
much more.

> And adding a rule to the various static checkers should catch most
> offenders.

This all sounds like a heavy lifting for something that can be easily
fixed up. If we ever see the additional and with a mask as a problem we
can think about a more optimal solution.
Michal Hocko June 27, 2018, 7:39 a.m. UTC | #5
On Wed 27-06-18 09:34:20, Michal Hocko wrote:
> On Tue 26-06-18 10:04:16, Andrew Morton wrote:
[...]
> > Really, the changelog isn't right.  There *is* a real reason to blow
> > up.  Effectively the caller is attempting to obtain the virtual address
> > of a highmem page without having kmapped it first.  That's an outright
> > bug.
> 
> And as I've argued before the code would be wrong regardless. We would
> leak the memory or worse touch somebody's else kmap without knowing
> that.  So we have a choice between a mem leak, data corruption k or a
> silent fixup. I would prefer the last option. And blowing up on a BUG
> is not much better on something that is easily fixable. I am not really
> convinced that & ~__GFP_HIGHMEM is something to lose sleep over.

It's been some time since I've checked that changelog and you are right
it should contain all the above so the changelog should be:

"
There is no real reason to blow up just because the caller doesn't know
that __get_free_pages cannot return highmem pages. Simply fix that up
silently. Even if we have some confused users such a fixup will not be
harmful.

On the other hand an incorrect usage can lead to either a memory leak
or worse a memory corruption when the allocated page hashes to an
already kmaped page. Most workloads run with CONFIG_DEBUG_VM disabled so
the assert wouldn't help.
"
Vlastimil Babka June 27, 2018, 7:50 a.m. UTC | #6
On 06/27/2018 09:34 AM, Michal Hocko wrote:
> On Tue 26-06-18 10:04:16, Andrew Morton wrote:
> 
> And as I've argued before the code would be wrong regardless. We would
> leak the memory or worse touch somebody's else kmap without knowing
> that.  So we have a choice between a mem leak, data corruption k or a
> silent fixup. I would prefer the last option. And blowing up on a BUG
> is not much better on something that is easily fixable. I am not really
> convinced that & ~__GFP_HIGHMEM is something to lose sleep over.

Maybe put the fixup into a "#ifdef CONFIG_HIGHMEM" block and then modern
systems won't care? In that case it could even be if (WARN_ON_ONCE(...))
so future cases with wrong expectations would become known.

Vlastimil
Michal Hocko June 27, 2018, 7:54 a.m. UTC | #7
On Wed 27-06-18 09:50:01, Vlastimil Babka wrote:
> On 06/27/2018 09:34 AM, Michal Hocko wrote:
> > On Tue 26-06-18 10:04:16, Andrew Morton wrote:
> > 
> > And as I've argued before the code would be wrong regardless. We would
> > leak the memory or worse touch somebody's else kmap without knowing
> > that.  So we have a choice between a mem leak, data corruption k or a
> > silent fixup. I would prefer the last option. And blowing up on a BUG
> > is not much better on something that is easily fixable. I am not really
> > convinced that & ~__GFP_HIGHMEM is something to lose sleep over.
> 
> Maybe put the fixup into a "#ifdef CONFIG_HIGHMEM" block and then modern
> systems won't care? In that case it could even be if (WARN_ON_ONCE(...))
> so future cases with wrong expectations would become known.

Yes that could be done as well. Or maybe we can make __GFP_HIGHMEM 0 for
!HIGHMEM systems. Does something really rely on it being non-zero?
Vlastimil Babka June 27, 2018, 10:47 a.m. UTC | #8
On 06/27/2018 09:54 AM, Michal Hocko wrote:
> On Wed 27-06-18 09:50:01, Vlastimil Babka wrote:
>> On 06/27/2018 09:34 AM, Michal Hocko wrote:
>>> On Tue 26-06-18 10:04:16, Andrew Morton wrote:
>>>
>>> And as I've argued before the code would be wrong regardless. We would
>>> leak the memory or worse touch somebody's else kmap without knowing
>>> that.  So we have a choice between a mem leak, data corruption k or a
>>> silent fixup. I would prefer the last option. And blowing up on a BUG
>>> is not much better on something that is easily fixable. I am not really
>>> convinced that & ~__GFP_HIGHMEM is something to lose sleep over.
>>
>> Maybe put the fixup into a "#ifdef CONFIG_HIGHMEM" block and then modern
>> systems won't care? In that case it could even be if (WARN_ON_ONCE(...))
>> so future cases with wrong expectations would become known.
> 
> Yes that could be done as well. Or maybe we can make __GFP_HIGHMEM 0 for
> !HIGHMEM systems. Does something really rely on it being non-zero?

I guess gfp_zone() would have to be checked, dunno about the rewrite of
GFP_ZONE_TABLE (CCing people).
In general checks like "if (flags & __GFP_HIGHMEM)" would become false,
which probably should not be a problem, unless something expect the flag
to be there and errors out if it isn't.
Michal Hocko June 27, 2018, 11:05 a.m. UTC | #9
On Wed 27-06-18 12:47:39, Vlastimil Babka wrote:
> On 06/27/2018 09:54 AM, Michal Hocko wrote:
> > On Wed 27-06-18 09:50:01, Vlastimil Babka wrote:
> >> On 06/27/2018 09:34 AM, Michal Hocko wrote:
> >>> On Tue 26-06-18 10:04:16, Andrew Morton wrote:
> >>>
> >>> And as I've argued before the code would be wrong regardless. We would
> >>> leak the memory or worse touch somebody's else kmap without knowing
> >>> that.  So we have a choice between a mem leak, data corruption k or a
> >>> silent fixup. I would prefer the last option. And blowing up on a BUG
> >>> is not much better on something that is easily fixable. I am not really
> >>> convinced that & ~__GFP_HIGHMEM is something to lose sleep over.
> >>
> >> Maybe put the fixup into a "#ifdef CONFIG_HIGHMEM" block and then modern
> >> systems won't care? In that case it could even be if (WARN_ON_ONCE(...))
> >> so future cases with wrong expectations would become known.
> > 
> > Yes that could be done as well. Or maybe we can make __GFP_HIGHMEM 0 for
> > !HIGHMEM systems. Does something really rely on it being non-zero?
> 
> I guess gfp_zone() would have to be checked, dunno about the rewrite of
> GFP_ZONE_TABLE (CCing people).
> In general checks like "if (flags & __GFP_HIGHMEM)" would become false,
> which probably should not be a problem, unless something expect the flag
> to be there and errors out if it isn't.

Well, __GFP_HIGHMEM should be basically GFP_KERNEL for !highmem systems.
But most checks I have seen try to mask it off. Having it 0 would help
to reduce at least some code.
Andrew Morton June 27, 2018, 9:14 p.m. UTC | #10
On Wed, 27 Jun 2018 09:50:01 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:

> On 06/27/2018 09:34 AM, Michal Hocko wrote:
> > On Tue 26-06-18 10:04:16, Andrew Morton wrote:
> > 
> > And as I've argued before the code would be wrong regardless. We would
> > leak the memory or worse touch somebody's else kmap without knowing
> > that.  So we have a choice between a mem leak, data corruption k or a
> > silent fixup. I would prefer the last option. And blowing up on a BUG
> > is not much better on something that is easily fixable. I am not really
> > convinced that & ~__GFP_HIGHMEM is something to lose sleep over.
> 
> Maybe put the fixup into a "#ifdef CONFIG_HIGHMEM" block and then modern
> systems won't care? In that case it could even be if (WARN_ON_ONCE(...))
> so future cases with wrong expectations would become known.
> 

The more I think about it, the more I like the VM_BUG_ON.

Look, if I was reviewing code which did

	page = alloc_page(__GFP_HIGHMEM);
	addr = page_to_virt(page);

I would say "that's a bug, you forgot to kmap the page".

And any code which does __get_free_pages(__GFP_HIGHMEM) is just as
buggy: it's requesting the virtual address of a high page without
having kmapped it.  Core MM shouldn't be silently kludging around the
bug by restricting the caller to using lowmem pages.

Maybe the caller really does want to use highmem, in which case the caller
should be using alloc_page(__GFP_HIGHMEM) and kmap().  Because core MM
detects and reports this bug, the developer will fix it.
Michal Hocko June 27, 2018, 9:24 p.m. UTC | #11
On Wed 27-06-18 14:14:12, Andrew Morton wrote:
> On Wed, 27 Jun 2018 09:50:01 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:
> 
> > On 06/27/2018 09:34 AM, Michal Hocko wrote:
> > > On Tue 26-06-18 10:04:16, Andrew Morton wrote:
> > > 
> > > And as I've argued before the code would be wrong regardless. We would
> > > leak the memory or worse touch somebody's else kmap without knowing
> > > that.  So we have a choice between a mem leak, data corruption k or a
> > > silent fixup. I would prefer the last option. And blowing up on a BUG
> > > is not much better on something that is easily fixable. I am not really
> > > convinced that & ~__GFP_HIGHMEM is something to lose sleep over.
> > 
> > Maybe put the fixup into a "#ifdef CONFIG_HIGHMEM" block and then modern
> > systems won't care? In that case it could even be if (WARN_ON_ONCE(...))
> > so future cases with wrong expectations would become known.
> > 
> 
> The more I think about it, the more I like the VM_BUG_ON.
> 
> Look, if I was reviewing code which did
> 
> 	page = alloc_page(__GFP_HIGHMEM);
> 	addr = page_to_virt(page);
> 
> I would say "that's a bug, you forgot to kmap the page".
> 
> And any code which does __get_free_pages(__GFP_HIGHMEM) is just as
> buggy: it's requesting the virtual address of a high page without
> having kmapped it.  Core MM shouldn't be silently kludging around the
> bug by restricting the caller to using lowmem pages.

I would argue that internal kernel APIs should trust their callers.
Panicing with an unknown context is about the worst way to teach
developers how to use the API properly. Because it will be end users
seeing an outage. So I would simply not care beyond documenting the
expectation. If we want to be more careful then fix it up. If you
disagree then just drop the patch. I do not insist so much to spend much
more time on something I thought was quite obvious. BUG_ON for an inpropoer
API usage is considered harmful for quite a long time by now. I do not
see why this would be any different.
diff mbox

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100f1e63..5f56f662a52d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4402,18 +4402,14 @@  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 EXPORT_SYMBOL(__alloc_pages_nodemask);
 
 /*
- * Common helper functions.
+ * Common helper functions. Never use with __GFP_HIGHMEM because the returned
+ * address cannot represent highmem pages. Use alloc_pages and then kmap if
+ * you need to access high mem.
  */
 unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order)
 {
 	struct page *page;
 
-	/*
-	 * __get_free_pages() returns a virtual address, which cannot represent
-	 * a highmem page
-	 */
-	VM_BUG_ON((gfp_mask & __GFP_HIGHMEM) != 0);
-
 	page = alloc_pages(gfp_mask, order);
 	if (!page)
 		return 0;