diff mbox series

[2/3] mm,thp,shm: limit gfp mask to no more than specified

Message ID 20201124194925.623931-3-riel@surriel.com (mailing list archive)
State New, archived
Headers show
Series mm,thp,shm: limit shmem THP alloc gfp_mask | expand

Commit Message

Rik van Riel Nov. 24, 2020, 7:49 p.m. UTC
Matthew Wilcox pointed out that the i915 driver opportunistically
allocates tmpfs memory, but will happily reclaim some of its
pool if no memory is available.

Make sure the gfp mask used to opportunistically allocate a THP
is always at least as restrictive as the original gfp mask.

Signed-off-by: Rik van Riel <riel@surriel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
---
 mm/shmem.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Comments

Michal Hocko Nov. 26, 2020, 1:40 p.m. UTC | #1
On Tue 24-11-20 14:49:24, Rik van Riel wrote:
> Matthew Wilcox pointed out that the i915 driver opportunistically
> allocates tmpfs memory, but will happily reclaim some of its
> pool if no memory is available.
> 
> Make sure the gfp mask used to opportunistically allocate a THP
> is always at least as restrictive as the original gfp mask.

I have brought this up in the previous version review and I feel my
feedback hasn't been addressed. Please describe the expected behavior by
those shmem users including GFP_KERNEL restriction which would make the
THP flags incompatible. Is this a problem? Is there any actual problem
if the THP uses its own set of flags?

I am also not happy how those two sets of flags are completely detached
and we can only expect surprises there. 

> Signed-off-by: Rik van Riel <riel@surriel.com>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> ---
>  mm/shmem.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 6c3cb192a88d..ee3cea10c2a4 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1531,6 +1531,26 @@ static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
>  	return page;
>  }
>  
> +/*
> + * Make sure huge_gfp is always more limited than limit_gfp.
> + * Some of the flags set permissions, while others set limitations.
> + */
> +static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
> +{
> +	gfp_t allowflags = __GFP_IO | __GFP_FS | __GFP_RECLAIM;
> +	gfp_t denyflags = __GFP_NOWARN | __GFP_NORETRY;
> +	gfp_t result = huge_gfp & ~allowflags;
> +
> +	/*
> +	 * Minimize the result gfp by taking the union with the deny flags,
> +	 * and the intersection of the allow flags.
> +	 */
> +	result |= (limit_gfp & denyflags);
> +	result |= (huge_gfp & limit_gfp) & allowflags;
> +
> +	return result;
> +}
> +
>  static struct page *shmem_alloc_hugepage(gfp_t gfp,
>  		struct shmem_inode_info *info, pgoff_t index)
>  {
> @@ -1889,6 +1909,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
>  
>  alloc_huge:
>  	huge_gfp = vma_thp_gfp_mask(vma);
> +	huge_gfp = limit_gfp_mask(huge_gfp, gfp);
>  	page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true);
>  	if (IS_ERR(page)) {
>  alloc_nohuge:
> -- 
> 2.25.4
>
Rik van Riel Nov. 26, 2020, 6:04 p.m. UTC | #2
On Thu, 2020-11-26 at 14:40 +0100, Michal Hocko wrote:
> On Tue 24-11-20 14:49:24, Rik van Riel wrote:
> > Matthew Wilcox pointed out that the i915 driver opportunistically
> > allocates tmpfs memory, but will happily reclaim some of its
> > pool if no memory is available.
> > 
> > Make sure the gfp mask used to opportunistically allocate a THP
> > is always at least as restrictive as the original gfp mask.
> 
> I have brought this up in the previous version review and I feel my
> feedback hasn't been addressed. Please describe the expected behavior
> by
> those shmem users including GFP_KERNEL restriction which would make
> the
> THP flags incompatible. Is this a problem? Is there any actual
> problem
> if the THP uses its own set of flags?

In the case of i915, the gfp flags passed in by the i915
driver expect the VM to easily fail the allocation, in
which case the i915 driver will reclaim some existing
buffers and try again.

Trying harder than the original gfp_mask would
change
the OOM behavior of systems using the i915 driver.

> I am also not happy how those two sets of flags are completely
> detached
> and we can only expect surprises there. 

I would be more than happy to implement things differently,
but I am not sure what alternative you are suggesting.
Michal Hocko Nov. 27, 2020, 7:52 a.m. UTC | #3
On Thu 26-11-20 13:04:14, Rik van Riel wrote:
> On Thu, 2020-11-26 at 14:40 +0100, Michal Hocko wrote:
> > On Tue 24-11-20 14:49:24, Rik van Riel wrote:
> > > Matthew Wilcox pointed out that the i915 driver opportunistically
> > > allocates tmpfs memory, but will happily reclaim some of its
> > > pool if no memory is available.
> > > 
> > > Make sure the gfp mask used to opportunistically allocate a THP
> > > is always at least as restrictive as the original gfp mask.
> > 
> > I have brought this up in the previous version review and I feel my
> > feedback hasn't been addressed. Please describe the expected behavior
> > by
> > those shmem users including GFP_KERNEL restriction which would make
> > the
> > THP flags incompatible. Is this a problem? Is there any actual
> > problem
> > if the THP uses its own set of flags?
> 
> In the case of i915, the gfp flags passed in by the i915
> driver expect the VM to easily fail the allocation, in
> which case the i915 driver will reclaim some existing
> buffers and try again.

The existing code tries hard to prevent from the oom killer AFAIU.
At least that is what i915_gem_object_get_pages_gtt says. And that is
ok for order-0 (or low order) requests. But THPs are costly orders and
therefore __GFP_NORETRY has a different meaning. It still controls how
hard to try compact but this is not a OOM control. ttm_tt_swapout is
similar except it chosen to try harder for order-0 cases but still want
to prevent the oom killer. 

> Trying harder than the original gfp_mask would change the OOM behavior
> of systems using the i915 driver.
> 
> > I am also not happy how those two sets of flags are completely
> > detached
> > and we can only expect surprises there. 
> 
> I would be more than happy to implement things differently,
> but I am not sure what alternative you are suggesting.

Simply do not alter gfp flags? Or warn in some cases of a serious mismatch.
E.g. GFP_ZONEMASK mismatch because there are already GFP_KERNEL users of
shmem.
Rik van Riel Nov. 27, 2020, 7:03 p.m. UTC | #4
On Fri, 2020-11-27 at 08:52 +0100, Michal Hocko wrote:
> On Thu 26-11-20 13:04:14, Rik van Riel wrote:
> > 
> > I would be more than happy to implement things differently,
> > but I am not sure what alternative you are suggesting.
> 
> Simply do not alter gfp flags? Or warn in some cases of a serious
> mismatch.
> E.g. GFP_ZONEMASK mismatch because there are already GFP_KERNEL users
> of
> shmem.

Not altering the gfp flags is not really an option,
because that would leads to attempting to allocate THPs
with GFP_HIGHUSER, which is what is used to allocate
regular tmpfs pages.

If the THP configuration in sysfs says we should
not
be doing compaction/reclaim from THP allocations, we
should obey that configuration setting, and use a 
gfp_flags that results in no compaction/reclaim being done.
Michal Hocko Nov. 30, 2020, 10 a.m. UTC | #5
On Fri 27-11-20 14:03:39, Rik van Riel wrote:
> On Fri, 2020-11-27 at 08:52 +0100, Michal Hocko wrote:
> > On Thu 26-11-20 13:04:14, Rik van Riel wrote:
> > > 
> > > I would be more than happy to implement things differently,
> > > but I am not sure what alternative you are suggesting.
> > 
> > Simply do not alter gfp flags? Or warn in some cases of a serious
> > mismatch.
> > E.g. GFP_ZONEMASK mismatch because there are already GFP_KERNEL users
> > of
> > shmem.
> 
> Not altering the gfp flags is not really an option,
> because that would leads to attempting to allocate THPs
> with GFP_HIGHUSER, which is what is used to allocate
> regular tmpfs pages.

Right but that is a completely different reason to alter the mask and it
would be really great to know whether this is a theoretical concern or
those users simply do not ever use THPs. Btw. should they be using THPs
even if they opt themselves into GFP_KERNEL restriction?
 
> If the THP configuration in sysfs says we should
> not
> be doing compaction/reclaim from THP allocations, we
> should obey that configuration setting, and use a 
> gfp_flags that results in no compaction/reclaim being done.

Yes, I agree with that. The thing I disagree with is that you try to mix
how hard to try also from the shmem users which are not really THP aware
and they merely want to control how hard to try to order-0 pages. Or
more precisely whether to invoke OOM killer before doing their fallback.

So your patch adds a very subtle behavior that would be really hard to
maintain long term because the way how hart to compact is completely
detached from users who use the gfp mask.
Rik van Riel Nov. 30, 2020, 2:40 p.m. UTC | #6
On Mon, 2020-11-30 at 11:00 +0100, Michal Hocko wrote:
> On Fri 27-11-20 14:03:39, Rik van Riel wrote:
> > On Fri, 2020-11-27 at 08:52 +0100, Michal Hocko wrote:
> > > On Thu 26-11-20 13:04:14, Rik van Riel wrote:
> > > > I would be more than happy to implement things differently,
> > > > but I am not sure what alternative you are suggesting.
> > > 
> > > Simply do not alter gfp flags? Or warn in some cases of a serious
> > > mismatch.
> > > E.g. GFP_ZONEMASK mismatch because there are already GFP_KERNEL
> > > users
> > > of
> > > shmem.
> > 
> > Not altering the gfp flags is not really an option,
> > because that would leads to attempting to allocate THPs
> > with GFP_HIGHUSER, which is what is used to allocate
> > regular tmpfs pages.
> 
> Right but that is a completely different reason to alter the mask and
> it
> would be really great to know whether this is a theoretical concern
> or
> those users simply do not ever use THPs. Btw. should they be using
> THPs
> even if they opt themselves into GFP_KERNEL restriction?

I suppose disabling THPs completely if the gfp_mask
passed to shmem_getpage_gfp() is not GFP_HIGHUSER
is another option.

That seems like it might come with its own pitfalls,
though, both functionality wise and maintenance wise.

Does anyone have
strong feelings between "limit gfp_mask"
and "disable THP for !GFP_HIGHUSER"?
diff mbox series

Patch

diff --git a/mm/shmem.c b/mm/shmem.c
index 6c3cb192a88d..ee3cea10c2a4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1531,6 +1531,26 @@  static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
 	return page;
 }
 
+/*
+ * Make sure huge_gfp is always more limited than limit_gfp.
+ * Some of the flags set permissions, while others set limitations.
+ */
+static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
+{
+	gfp_t allowflags = __GFP_IO | __GFP_FS | __GFP_RECLAIM;
+	gfp_t denyflags = __GFP_NOWARN | __GFP_NORETRY;
+	gfp_t result = huge_gfp & ~allowflags;
+
+	/*
+	 * Minimize the result gfp by taking the union with the deny flags,
+	 * and the intersection of the allow flags.
+	 */
+	result |= (limit_gfp & denyflags);
+	result |= (huge_gfp & limit_gfp) & allowflags;
+
+	return result;
+}
+
 static struct page *shmem_alloc_hugepage(gfp_t gfp,
 		struct shmem_inode_info *info, pgoff_t index)
 {
@@ -1889,6 +1909,7 @@  static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 
 alloc_huge:
 	huge_gfp = vma_thp_gfp_mask(vma);
+	huge_gfp = limit_gfp_mask(huge_gfp, gfp);
 	page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true);
 	if (IS_ERR(page)) {
 alloc_nohuge: