diff mbox series

[v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()

Message ID 20190314094249.19606-1-vbabka@suse.cz (mailing list archive)
State New, archived
Headers show
Series [v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact() | expand

Commit Message

Vlastimil Babka March 14, 2019, 9:42 a.m. UTC
alloc_pages_exact*() allocates a page of sufficient order and then splits it
to return only the number of pages requested. That makes it incompatible with
__GFP_COMP, because compound pages cannot be split.

As shown by [1] things may silently work until the requested size (possibly
depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.

There are several options here, none of them great:

1) Don't do the spliting when __GFP_COMP is passed, and return the whole
compound page. However if caller then returns it via free_pages_exact(),
that will be unexpected and the freeing actions there will be wrong.

2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
things may break later somewhere.

3) Warn and return NULL. However NULL may be unexpected, especially for
small sizes.

This patch picks option 3, as it's best defined.

[1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Sent v1 before amending commit, sorry.

 mm/page_alloc.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Michal Hocko March 14, 2019, 10:15 a.m. UTC | #1
On Thu 14-03-19 10:42:49, Vlastimil Babka wrote:
> alloc_pages_exact*() allocates a page of sufficient order and then splits it
> to return only the number of pages requested. That makes it incompatible with
> __GFP_COMP, because compound pages cannot be split.
> 
> As shown by [1] things may silently work until the requested size (possibly
> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
> 
> There are several options here, none of them great:
> 
> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
> compound page. However if caller then returns it via free_pages_exact(),
> that will be unexpected and the freeing actions there will be wrong.
> 
> 2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
> things may break later somewhere.
> 
> 3) Warn and return NULL. However NULL may be unexpected, especially for
> small sizes.
> 
> This patch picks option 3, as it's best defined.

The question is whether callers of alloc_pages_exact do have any
fallback because if they don't then this is forcing an always fail path
and I strongly suspect this is not really what users want. I would
rather go with 2) because "callers wanted it" is much less probable than
"caller is simply confused and more gfp flags is surely better than
fewer".

> [1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> Sent v1 before amending commit, sorry.
> 
>  mm/page_alloc.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0b9f577b1a2a..dd3f89e8f88d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4752,7 +4752,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
>  /**
>   * alloc_pages_exact - allocate an exact number physically-contiguous pages.
>   * @size: the number of bytes to allocate
> - * @gfp_mask: GFP flags for the allocation
> + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
>   *
>   * This function is similar to alloc_pages(), except that it allocates the
>   * minimum number of pages to satisfy the request.  alloc_pages() can only
> @@ -4768,6 +4768,10 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
>  	unsigned long addr;
>  
>  	addr = __get_free_pages(gfp_mask, order);
> +
> +	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
> +		return NULL;
> +
>  	return make_alloc_exact(addr, order, size);
>  }
>  EXPORT_SYMBOL(alloc_pages_exact);
> @@ -4777,7 +4781,7 @@ EXPORT_SYMBOL(alloc_pages_exact);
>   *			   pages on a node.
>   * @nid: the preferred node ID where memory should be allocated
>   * @size: the number of bytes to allocate
> - * @gfp_mask: GFP flags for the allocation
> + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
>   *
>   * Like alloc_pages_exact(), but try to allocate on node nid first before falling
>   * back.
> @@ -4785,7 +4789,12 @@ EXPORT_SYMBOL(alloc_pages_exact);
>  void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
>  {
>  	unsigned int order = get_order(size);
> -	struct page *p = alloc_pages_node(nid, gfp_mask, order);
> +	struct page *p;
> +
> +	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
> +		return NULL;
> +
> +	p = alloc_pages_node(nid, gfp_mask, order);
>  	if (!p)
>  		return NULL;
>  	return make_alloc_exact((unsigned long)page_address(p), order, size);
> -- 
> 2.20.1
Vlastimil Babka March 14, 2019, 10:30 a.m. UTC | #2
On 3/14/19 11:15 AM, Michal Hocko wrote:
> On Thu 14-03-19 10:42:49, Vlastimil Babka wrote:
>> alloc_pages_exact*() allocates a page of sufficient order and then splits it
>> to return only the number of pages requested. That makes it incompatible with
>> __GFP_COMP, because compound pages cannot be split.
>> 
>> As shown by [1] things may silently work until the requested size (possibly
>> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
>> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
>> 
>> There are several options here, none of them great:
>> 
>> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
>> compound page. However if caller then returns it via free_pages_exact(),
>> that will be unexpected and the freeing actions there will be wrong.
>> 
>> 2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
>> things may break later somewhere.
>> 
>> 3) Warn and return NULL. However NULL may be unexpected, especially for
>> small sizes.
>> 
>> This patch picks option 3, as it's best defined.
> 
> The question is whether callers of alloc_pages_exact do have any
> fallback because if they don't then this is forcing an always fail path
> and I strongly suspect this is not really what users want. I would
> rather go with 2) because "callers wanted it" is much less probable than
> "caller is simply confused and more gfp flags is surely better than
> fewer".

I initially went with 2 as well, as you can see from v1 :) but then I looked at
the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
that the pages are then mapped to userspace. Breaking that didn't seem good.

The point is that with the warning in place, A developer will immediately know
that they did something wrong, regardless if the size is power-of-two or not.
But yeah, if it's adding of __GFP_COMP that is not deterministic, a bug can
still sit silently for a while.

But maybe we could go with 1) if free_pages_exact() is also adjusted to check
for CompoundPage and free it properly?

>> [1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u

[2]
https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=3a6d1980fe96dbbfe3ae58db0048867f5319cdbf

>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>> Sent v1 before amending commit, sorry.
>> 
>>  mm/page_alloc.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 0b9f577b1a2a..dd3f89e8f88d 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4752,7 +4752,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
>>  /**
>>   * alloc_pages_exact - allocate an exact number physically-contiguous pages.
>>   * @size: the number of bytes to allocate
>> - * @gfp_mask: GFP flags for the allocation
>> + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
>>   *
>>   * This function is similar to alloc_pages(), except that it allocates the
>>   * minimum number of pages to satisfy the request.  alloc_pages() can only
>> @@ -4768,6 +4768,10 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
>>  	unsigned long addr;
>>  
>>  	addr = __get_free_pages(gfp_mask, order);
>> +
>> +	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
>> +		return NULL;
>> +
>>  	return make_alloc_exact(addr, order, size);
>>  }
>>  EXPORT_SYMBOL(alloc_pages_exact);
>> @@ -4777,7 +4781,7 @@ EXPORT_SYMBOL(alloc_pages_exact);
>>   *			   pages on a node.
>>   * @nid: the preferred node ID where memory should be allocated
>>   * @size: the number of bytes to allocate
>> - * @gfp_mask: GFP flags for the allocation
>> + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
>>   *
>>   * Like alloc_pages_exact(), but try to allocate on node nid first before falling
>>   * back.
>> @@ -4785,7 +4789,12 @@ EXPORT_SYMBOL(alloc_pages_exact);
>>  void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
>>  {
>>  	unsigned int order = get_order(size);
>> -	struct page *p = alloc_pages_node(nid, gfp_mask, order);
>> +	struct page *p;
>> +
>> +	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
>> +		return NULL;
>> +
>> +	p = alloc_pages_node(nid, gfp_mask, order);
>>  	if (!p)
>>  		return NULL;
>>  	return make_alloc_exact((unsigned long)page_address(p), order, size);
>> -- 
>> 2.20.1
>
Michal Hocko March 14, 2019, 11:36 a.m. UTC | #3
On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> On 3/14/19 11:15 AM, Michal Hocko wrote:
> > On Thu 14-03-19 10:42:49, Vlastimil Babka wrote:
> >> alloc_pages_exact*() allocates a page of sufficient order and then splits it
> >> to return only the number of pages requested. That makes it incompatible with
> >> __GFP_COMP, because compound pages cannot be split.
> >> 
> >> As shown by [1] things may silently work until the requested size (possibly
> >> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
> >> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
> >> 
> >> There are several options here, none of them great:
> >> 
> >> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
> >> compound page. However if caller then returns it via free_pages_exact(),
> >> that will be unexpected and the freeing actions there will be wrong.
> >> 
> >> 2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
> >> things may break later somewhere.
> >> 
> >> 3) Warn and return NULL. However NULL may be unexpected, especially for
> >> small sizes.
> >> 
> >> This patch picks option 3, as it's best defined.
> > 
> > The question is whether callers of alloc_pages_exact do have any
> > fallback because if they don't then this is forcing an always fail path
> > and I strongly suspect this is not really what users want. I would
> > rather go with 2) because "callers wanted it" is much less probable than
> > "caller is simply confused and more gfp flags is surely better than
> > fewer".
> 
> I initially went with 2 as well, as you can see from v1 :) but then I looked at
> the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> that the pages are then mapped to userspace. Breaking that didn't seem good.

It used the flag legitimately before because they were allocating
compound pages but now they don't so this is just a conversion bug.
Why should we screw up the helper for that reason? Or put in other words
why a silent fix up adds any risk?

> The point is that with the warning in place, A developer will immediately know
> that they did something wrong, regardless if the size is power-of-two or not.
> But yeah, if it's adding of __GFP_COMP that is not deterministic, a bug can
> still sit silently for a while.
> 
> But maybe we could go with 1) if free_pages_exact() is also adjusted to check
> for CompoundPage and free it properly?

I dunno, it sounds like it adds even more confusion.

> >> [1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u
> 
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=3a6d1980fe96dbbfe3ae58db0048867f5319cdbf
Takashi Iwai March 14, 2019, 11:56 a.m. UTC | #4
On Thu, 14 Mar 2019 12:36:26 +0100,
Michal Hocko wrote:
> 
> On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> > On 3/14/19 11:15 AM, Michal Hocko wrote:
> > > On Thu 14-03-19 10:42:49, Vlastimil Babka wrote:
> > >> alloc_pages_exact*() allocates a page of sufficient order and then splits it
> > >> to return only the number of pages requested. That makes it incompatible with
> > >> __GFP_COMP, because compound pages cannot be split.
> > >> 
> > >> As shown by [1] things may silently work until the requested size (possibly
> > >> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
> > >> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
> > >> 
> > >> There are several options here, none of them great:
> > >> 
> > >> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
> > >> compound page. However if caller then returns it via free_pages_exact(),
> > >> that will be unexpected and the freeing actions there will be wrong.
> > >> 
> > >> 2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
> > >> things may break later somewhere.
> > >> 
> > >> 3) Warn and return NULL. However NULL may be unexpected, especially for
> > >> small sizes.
> > >> 
> > >> This patch picks option 3, as it's best defined.
> > > 
> > > The question is whether callers of alloc_pages_exact do have any
> > > fallback because if they don't then this is forcing an always fail path
> > > and I strongly suspect this is not really what users want. I would
> > > rather go with 2) because "callers wanted it" is much less probable than
> > > "caller is simply confused and more gfp flags is surely better than
> > > fewer".
> > 
> > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > that the pages are then mapped to userspace. Breaking that didn't seem good.
> 
> It used the flag legitimately before because they were allocating
> compound pages but now they don't so this is just a conversion bug.

We still use __GFP_COMP for allocation of the sound buffers that are
also mmapped to user-space.  The mentioned commit above [2] was
reverted later.

But honestly speaking, I'm not sure whether we still need the compound
pages.  The change was introduced long time ago (commit f3d48f0373c1
in 2005).  Is it superfluous nowadays...?

> Why should we screw up the helper for that reason? Or put in other words
> why a silent fix up adds any risk?

IMO, it's good to catch the incompatible usage as early as possible,
so that others won't hit the same failure again like I did.  There
aren't so many users of __GFP_COMP in the whole tree, after all.


thanks,

Takashi

> > The point is that with the warning in place, A developer will immediately know
> > that they did something wrong, regardless if the size is power-of-two or not.
> > But yeah, if it's adding of __GFP_COMP that is not deterministic, a bug can
> > still sit silently for a while.
> > 
> > But maybe we could go with 1) if free_pages_exact() is also adjusted to check
> > for CompoundPage and free it properly?
> 
> I dunno, it sounds like it adds even more confusion.
> 
> > >> [1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u
> > 
> > [2]
> > https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=3a6d1980fe96dbbfe3ae58db0048867f5319cdbf
> -- 
> Michal Hocko
> SUSE Labs
>
Michal Hocko March 14, 2019, 12:09 p.m. UTC | #5
On Thu 14-03-19 12:56:43, Takashi Iwai wrote:
> On Thu, 14 Mar 2019 12:36:26 +0100,
> Michal Hocko wrote:
> > 
> > On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> > > On 3/14/19 11:15 AM, Michal Hocko wrote:
> > > > On Thu 14-03-19 10:42:49, Vlastimil Babka wrote:
> > > >> alloc_pages_exact*() allocates a page of sufficient order and then splits it
> > > >> to return only the number of pages requested. That makes it incompatible with
> > > >> __GFP_COMP, because compound pages cannot be split.
> > > >> 
> > > >> As shown by [1] things may silently work until the requested size (possibly
> > > >> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
> > > >> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
> > > >> 
> > > >> There are several options here, none of them great:
> > > >> 
> > > >> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
> > > >> compound page. However if caller then returns it via free_pages_exact(),
> > > >> that will be unexpected and the freeing actions there will be wrong.
> > > >> 
> > > >> 2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
> > > >> things may break later somewhere.
> > > >> 
> > > >> 3) Warn and return NULL. However NULL may be unexpected, especially for
> > > >> small sizes.
> > > >> 
> > > >> This patch picks option 3, as it's best defined.
> > > > 
> > > > The question is whether callers of alloc_pages_exact do have any
> > > > fallback because if they don't then this is forcing an always fail path
> > > > and I strongly suspect this is not really what users want. I would
> > > > rather go with 2) because "callers wanted it" is much less probable than
> > > > "caller is simply confused and more gfp flags is surely better than
> > > > fewer".
> > > 
> > > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > > that the pages are then mapped to userspace. Breaking that didn't seem good.
> > 
> > It used the flag legitimately before because they were allocating
> > compound pages but now they don't so this is just a conversion bug.
> 
> We still use __GFP_COMP for allocation of the sound buffers that are
> also mmapped to user-space.  The mentioned commit above [2] was
> reverted later.

Yes, I understand that part. __GFP_COMP makes sense on a comound page.
But if you are using alloc_pages_exact then the flag doesn't make sense
because split out should already do what you want. Unless I am missing
something.

> But honestly speaking, I'm not sure whether we still need the compound
> pages.  The change was introduced long time ago (commit f3d48f0373c1
> in 2005).  Is it superfluous nowadays...?

AFAIU alloc_pages_exact should do do what you need.

> > Why should we screw up the helper for that reason? Or put in other words
> > why a silent fix up adds any risk?
> 
> IMO, it's good to catch the incompatible usage as early as possible,
> so that others won't hit the same failure again like I did.  There
> aren't so many users of __GFP_COMP in the whole tree, after all.

Yes, completely agreed and warning with a fixup sounds like the safest
option to me. Returning NULL is risky because it essentially introduces a
permanent failure mode as already pointed out.
Takashi Iwai March 14, 2019, 1:15 p.m. UTC | #6
On Thu, 14 Mar 2019 13:09:39 +0100,
Michal Hocko wrote:
> 
> On Thu 14-03-19 12:56:43, Takashi Iwai wrote:
> > On Thu, 14 Mar 2019 12:36:26 +0100,
> > Michal Hocko wrote:
> > > 
> > > On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> > > > On 3/14/19 11:15 AM, Michal Hocko wrote:
> > > > > On Thu 14-03-19 10:42:49, Vlastimil Babka wrote:
> > > > >> alloc_pages_exact*() allocates a page of sufficient order and then splits it
> > > > >> to return only the number of pages requested. That makes it incompatible with
> > > > >> __GFP_COMP, because compound pages cannot be split.
> > > > >> 
> > > > >> As shown by [1] things may silently work until the requested size (possibly
> > > > >> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
> > > > >> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
> > > > >> 
> > > > >> There are several options here, none of them great:
> > > > >> 
> > > > >> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
> > > > >> compound page. However if caller then returns it via free_pages_exact(),
> > > > >> that will be unexpected and the freeing actions there will be wrong.
> > > > >> 
> > > > >> 2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
> > > > >> things may break later somewhere.
> > > > >> 
> > > > >> 3) Warn and return NULL. However NULL may be unexpected, especially for
> > > > >> small sizes.
> > > > >> 
> > > > >> This patch picks option 3, as it's best defined.
> > > > > 
> > > > > The question is whether callers of alloc_pages_exact do have any
> > > > > fallback because if they don't then this is forcing an always fail path
> > > > > and I strongly suspect this is not really what users want. I would
> > > > > rather go with 2) because "callers wanted it" is much less probable than
> > > > > "caller is simply confused and more gfp flags is surely better than
> > > > > fewer".
> > > > 
> > > > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > > > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > > > that the pages are then mapped to userspace. Breaking that didn't seem good.
> > > 
> > > It used the flag legitimately before because they were allocating
> > > compound pages but now they don't so this is just a conversion bug.
> > 
> > We still use __GFP_COMP for allocation of the sound buffers that are
> > also mmapped to user-space.  The mentioned commit above [2] was
> > reverted later.
> 
> Yes, I understand that part. __GFP_COMP makes sense on a comound page.
> But if you are using alloc_pages_exact then the flag doesn't make sense
> because split out should already do what you want. Unless I am missing
> something.

The __GFP_COMP was taken as a sort of workaround for the problem wrt
mmap I already forgot.  If it can be eliminated, it's all good.

> > But honestly speaking, I'm not sure whether we still need the compound
> > pages.  The change was introduced long time ago (commit f3d48f0373c1
> > in 2005).  Is it superfluous nowadays...?
> 
> AFAIU alloc_pages_exact should do do what you need.

OK, I'll try whether it works with alloc_pages_exact() and dropping
__GFP_COMP.


Thanks!

Takashi
Michal Hocko March 14, 2019, 1:29 p.m. UTC | #7
On Thu 14-03-19 14:15:38, Takashi Iwai wrote:
> On Thu, 14 Mar 2019 13:09:39 +0100,
> Michal Hocko wrote:
> > 
> > On Thu 14-03-19 12:56:43, Takashi Iwai wrote:
> > > On Thu, 14 Mar 2019 12:36:26 +0100,
> > > Michal Hocko wrote:
> > > > 
> > > > On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
[...]
> > > > > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > > > > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > > > > that the pages are then mapped to userspace. Breaking that didn't seem good.
> > > > 
> > > > It used the flag legitimately before because they were allocating
> > > > compound pages but now they don't so this is just a conversion bug.
> > > 
> > > We still use __GFP_COMP for allocation of the sound buffers that are
> > > also mmapped to user-space.  The mentioned commit above [2] was
> > > reverted later.
> > 
> > Yes, I understand that part. __GFP_COMP makes sense on a comound page.
> > But if you are using alloc_pages_exact then the flag doesn't make sense
> > because split out should already do what you want. Unless I am missing
> > something.
> 
> The __GFP_COMP was taken as a sort of workaround for the problem wrt
> mmap I already forgot.  If it can be eliminated, it's all good.

Without __GFP_COMP you would get tail pages which are not setup properly
AFAIU. With alloc_pages_exact you should get an "array" of head pages
which are properly reference counted. But I might misunderstood the
original problem which __GFP_COMP tried to solve.
Takashi Iwai March 14, 2019, 4:52 p.m. UTC | #8
On Thu, 14 Mar 2019 14:29:33 +0100,
Michal Hocko wrote:
> 
> On Thu 14-03-19 14:15:38, Takashi Iwai wrote:
> > On Thu, 14 Mar 2019 13:09:39 +0100,
> > Michal Hocko wrote:
> > > 
> > > On Thu 14-03-19 12:56:43, Takashi Iwai wrote:
> > > > On Thu, 14 Mar 2019 12:36:26 +0100,
> > > > Michal Hocko wrote:
> > > > > 
> > > > > On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> [...]
> > > > > > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > > > > > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > > > > > that the pages are then mapped to userspace. Breaking that didn't seem good.
> > > > > 
> > > > > It used the flag legitimately before because they were allocating
> > > > > compound pages but now they don't so this is just a conversion bug.
> > > > 
> > > > We still use __GFP_COMP for allocation of the sound buffers that are
> > > > also mmapped to user-space.  The mentioned commit above [2] was
> > > > reverted later.
> > > 
> > > Yes, I understand that part. __GFP_COMP makes sense on a comound page.
> > > But if you are using alloc_pages_exact then the flag doesn't make sense
> > > because split out should already do what you want. Unless I am missing
> > > something.
> > 
> > The __GFP_COMP was taken as a sort of workaround for the problem wrt
> > mmap I already forgot.  If it can be eliminated, it's all good.
> 
> Without __GFP_COMP you would get tail pages which are not setup properly
> AFAIU. With alloc_pages_exact you should get an "array" of head pages
> which are properly reference counted. But I might misunderstood the
> original problem which __GFP_COMP tried to solve.

I only vaguely remember that it was about a Bad Page error for the
reserved pages, but forgot the all details, sorry.

Hugh, could you confirm whether we still need __GFP_COMP in the sound
buffer allocations?  FWIW, it's the change introduced by the ancient
commit f3d48f0373c1.


thanks,

Takashi
Hugh Dickins March 14, 2019, 5:37 p.m. UTC | #9
On Thu, 14 Mar 2019, Takashi Iwai wrote:
> On Thu, 14 Mar 2019 14:29:33 +0100,
> Michal Hocko wrote:
> > 
> > On Thu 14-03-19 14:15:38, Takashi Iwai wrote:
> > > On Thu, 14 Mar 2019 13:09:39 +0100,
> > > Michal Hocko wrote:
> > > > 
> > > > On Thu 14-03-19 12:56:43, Takashi Iwai wrote:
> > > > > On Thu, 14 Mar 2019 12:36:26 +0100,
> > > > > Michal Hocko wrote:
> > > > > > 
> > > > > > On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> > [...]
> > > > > > > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > > > > > > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > > > > > > that the pages are then mapped to userspace. Breaking that didn't seem good.
> > > > > > 
> > > > > > It used the flag legitimately before because they were allocating
> > > > > > compound pages but now they don't so this is just a conversion bug.
> > > > > 
> > > > > We still use __GFP_COMP for allocation of the sound buffers that are
> > > > > also mmapped to user-space.  The mentioned commit above [2] was
> > > > > reverted later.
> > > > 
> > > > Yes, I understand that part. __GFP_COMP makes sense on a comound page.
> > > > But if you are using alloc_pages_exact then the flag doesn't make sense
> > > > because split out should already do what you want. Unless I am missing
> > > > something.
> > > 
> > > The __GFP_COMP was taken as a sort of workaround for the problem wrt
> > > mmap I already forgot.  If it can be eliminated, it's all good.
> > 
> > Without __GFP_COMP you would get tail pages which are not setup properly
> > AFAIU. With alloc_pages_exact you should get an "array" of head pages
> > which are properly reference counted. But I might misunderstood the
> > original problem which __GFP_COMP tried to solve.
> 
> I only vaguely remember that it was about a Bad Page error for the
> reserved pages, but forgot the all details, sorry.
> 
> Hugh, could you confirm whether we still need __GFP_COMP in the sound
> buffer allocations?  FWIW, it's the change introduced by the ancient
> commit f3d48f0373c1.

I'm not confident in finding all "the sound buffer allocations".
Where you're using alloc_pages_exact() for them, you do not need
__GFP_COMP, and should not pass it.  But if there are other places
where you use one of those page allocators with an "order" argument
non-zero, and map that buffer into userspace (without any split_page()),
there you would still need the __GFP_COMP - zap_pte_range() and others
do the wrong thing on tail ptes if the non-zero-order page has neither
been set up as compound nor split into zero-order pages.

Hugh
Takashi Iwai March 14, 2019, 6 p.m. UTC | #10
On Thu, 14 Mar 2019 18:37:06 +0100,
Hugh Dickins wrote:
> 
> On Thu, 14 Mar 2019, Takashi Iwai wrote:
> > On Thu, 14 Mar 2019 14:29:33 +0100,
> > Michal Hocko wrote:
> > > 
> > > On Thu 14-03-19 14:15:38, Takashi Iwai wrote:
> > > > On Thu, 14 Mar 2019 13:09:39 +0100,
> > > > Michal Hocko wrote:
> > > > > 
> > > > > On Thu 14-03-19 12:56:43, Takashi Iwai wrote:
> > > > > > On Thu, 14 Mar 2019 12:36:26 +0100,
> > > > > > Michal Hocko wrote:
> > > > > > > 
> > > > > > > On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> > > [...]
> > > > > > > > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > > > > > > > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > > > > > > > that the pages are then mapped to userspace. Breaking that didn't seem good.
> > > > > > > 
> > > > > > > It used the flag legitimately before because they were allocating
> > > > > > > compound pages but now they don't so this is just a conversion bug.
> > > > > > 
> > > > > > We still use __GFP_COMP for allocation of the sound buffers that are
> > > > > > also mmapped to user-space.  The mentioned commit above [2] was
> > > > > > reverted later.
> > > > > 
> > > > > Yes, I understand that part. __GFP_COMP makes sense on a comound page.
> > > > > But if you are using alloc_pages_exact then the flag doesn't make sense
> > > > > because split out should already do what you want. Unless I am missing
> > > > > something.
> > > > 
> > > > The __GFP_COMP was taken as a sort of workaround for the problem wrt
> > > > mmap I already forgot.  If it can be eliminated, it's all good.
> > > 
> > > Without __GFP_COMP you would get tail pages which are not setup properly
> > > AFAIU. With alloc_pages_exact you should get an "array" of head pages
> > > which are properly reference counted. But I might misunderstood the
> > > original problem which __GFP_COMP tried to solve.
> > 
> > I only vaguely remember that it was about a Bad Page error for the
> > reserved pages, but forgot the all details, sorry.
> > 
> > Hugh, could you confirm whether we still need __GFP_COMP in the sound
> > buffer allocations?  FWIW, it's the change introduced by the ancient
> > commit f3d48f0373c1.
> 
> I'm not confident in finding all "the sound buffer allocations".
> Where you're using alloc_pages_exact() for them, you do not need
> __GFP_COMP, and should not pass it.

It was my fault attempt to convert to alloc_pages_exact() and hitting
the incompatibility with __GFP_COMP, so it was reverted in the end.

> But if there are other places
> where you use one of those page allocators with an "order" argument
> non-zero, and map that buffer into userspace (without any split_page()),
> there you would still need the __GFP_COMP - zap_pte_range() and others
> do the wrong thing on tail ptes if the non-zero-order page has neither
> been set up as compound nor split into zero-order pages.

Hm, what if we allocate the whole pages via alloc_pages_exact() (but
without __GFP_COMP)?  Can we mmap them properly to user-space like
before, or it won't work as-is?


thanks,

Takashi
Hugh Dickins March 14, 2019, 6:15 p.m. UTC | #11
On Thu, 14 Mar 2019, Takashi Iwai wrote:
> On Thu, 14 Mar 2019 18:37:06 +0100,Hugh Dickins wrote:
> > On Thu, 14 Mar 2019, Takashi Iwai wrote:
> > > 
> > > Hugh, could you confirm whether we still need __GFP_COMP in the sound
> > > buffer allocations?  FWIW, it's the change introduced by the ancient
> > > commit f3d48f0373c1.
> > 
> > I'm not confident in finding all "the sound buffer allocations".
> > Where you're using alloc_pages_exact() for them, you do not need
> > __GFP_COMP, and should not pass it.
> 
> It was my fault attempt to convert to alloc_pages_exact() and hitting
> the incompatibility with __GFP_COMP, so it was reverted in the end.
> 
> > But if there are other places
> > where you use one of those page allocators with an "order" argument
> > non-zero, and map that buffer into userspace (without any split_page()),
> > there you would still need the __GFP_COMP - zap_pte_range() and others
> > do the wrong thing on tail ptes if the non-zero-order page has neither
> > been set up as compound nor split into zero-order pages.
> 
> Hm, what if we allocate the whole pages via alloc_pages_exact() (but
> without __GFP_COMP)?  Can we mmap them properly to user-space like
> before, or it won't work as-is?

Yes, you can map the alloc_pages_exact() pages to user-space as
before, whether or not it ended up using a whole non-zero-order page:
alloc_pages_exact() does a split_page(), so the subpages end up all just
ordinary order-zero pages (and need to be freed individually, which
free_pages_exact() does for you).

Hugh
kirill.shutemov@linux.intel.com March 14, 2019, 6:51 p.m. UTC | #12
On Thu, Mar 14, 2019 at 09:42:49AM +0000, Vlastimil Babka wrote:
> @@ -4752,7 +4752,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
>  /**
>   * alloc_pages_exact - allocate an exact number physically-contiguous pages.
>   * @size: the number of bytes to allocate
> - * @gfp_mask: GFP flags for the allocation
> + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
>   *
>   * This function is similar to alloc_pages(), except that it allocates the
>   * minimum number of pages to satisfy the request.  alloc_pages() can only
> @@ -4768,6 +4768,10 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
>  	unsigned long addr;
>  
>  	addr = __get_free_pages(gfp_mask, order);
> +
> +	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
> +		return NULL;
> +

Shouldn't it be before __get_free_pages() call? :P

>  	return make_alloc_exact(addr, order, size);
>  }
>  EXPORT_SYMBOL(alloc_pages_exact);
Takashi Iwai March 14, 2019, 8:13 p.m. UTC | #13
On Thu, 14 Mar 2019 19:15:22 +0100,
Hugh Dickins wrote:
> 
> On Thu, 14 Mar 2019, Takashi Iwai wrote:
> > On Thu, 14 Mar 2019 18:37:06 +0100,Hugh Dickins wrote:
> > > On Thu, 14 Mar 2019, Takashi Iwai wrote:
> > > > 
> > > > Hugh, could you confirm whether we still need __GFP_COMP in the sound
> > > > buffer allocations?  FWIW, it's the change introduced by the ancient
> > > > commit f3d48f0373c1.
> > > 
> > > I'm not confident in finding all "the sound buffer allocations".
> > > Where you're using alloc_pages_exact() for them, you do not need
> > > __GFP_COMP, and should not pass it.
> > 
> > It was my fault attempt to convert to alloc_pages_exact() and hitting
> > the incompatibility with __GFP_COMP, so it was reverted in the end.
> > 
> > > But if there are other places
> > > where you use one of those page allocators with an "order" argument
> > > non-zero, and map that buffer into userspace (without any split_page()),
> > > there you would still need the __GFP_COMP - zap_pte_range() and others
> > > do the wrong thing on tail ptes if the non-zero-order page has neither
> > > been set up as compound nor split into zero-order pages.
> > 
> > Hm, what if we allocate the whole pages via alloc_pages_exact() (but
> > without __GFP_COMP)?  Can we mmap them properly to user-space like
> > before, or it won't work as-is?
> 
> Yes, you can map the alloc_pages_exact() pages to user-space as
> before, whether or not it ended up using a whole non-zero-order page:
> alloc_pages_exact() does a split_page(), so the subpages end up all just
> ordinary order-zero pages (and need to be freed individually, which
> free_pages_exact() does for you).

Great, thanks for clarification!


Takashi
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0b9f577b1a2a..dd3f89e8f88d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4752,7 +4752,7 @@  static void *make_alloc_exact(unsigned long addr, unsigned int order,
 /**
  * alloc_pages_exact - allocate an exact number physically-contiguous pages.
  * @size: the number of bytes to allocate
- * @gfp_mask: GFP flags for the allocation
+ * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
  *
  * This function is similar to alloc_pages(), except that it allocates the
  * minimum number of pages to satisfy the request.  alloc_pages() can only
@@ -4768,6 +4768,10 @@  void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
 	unsigned long addr;
 
 	addr = __get_free_pages(gfp_mask, order);
+
+	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
+		return NULL;
+
 	return make_alloc_exact(addr, order, size);
 }
 EXPORT_SYMBOL(alloc_pages_exact);
@@ -4777,7 +4781,7 @@  EXPORT_SYMBOL(alloc_pages_exact);
  *			   pages on a node.
  * @nid: the preferred node ID where memory should be allocated
  * @size: the number of bytes to allocate
- * @gfp_mask: GFP flags for the allocation
+ * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
  *
  * Like alloc_pages_exact(), but try to allocate on node nid first before falling
  * back.
@@ -4785,7 +4789,12 @@  EXPORT_SYMBOL(alloc_pages_exact);
 void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
 {
 	unsigned int order = get_order(size);
-	struct page *p = alloc_pages_node(nid, gfp_mask, order);
+	struct page *p;
+
+	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
+		return NULL;
+
+	p = alloc_pages_node(nid, gfp_mask, order);
 	if (!p)
 		return NULL;
 	return make_alloc_exact((unsigned long)page_address(p), order, size);