diff mbox series

[v6,2/6] mm/memplicy: add page allocation function for MPOL_PREFERRED_MANY policy

Message ID 1626077374-81682-3-git-send-email-feng.tang@intel.com (mailing list archive)
State New
Headers show
Series Introduce multi-preference mempolicy | expand

Commit Message

Feng Tang July 12, 2021, 8:09 a.m. UTC
The semantics of MPOL_PREFERRED_MANY is similar to MPOL_PREFERRED,
that it will first try to allocate memory from the preferred node(s),
and fallback to all nodes in system when first try fails.

Add a dedicated function for it just like 'interleave' policy.

Link: https://lore.kernel.org/r/20200630212517.308045-9-ben.widawsky@intel.com
Suggested-by: Michal Hocko <mhocko@suse.com>
Co-developed-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 mm/mempolicy.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Michal Hocko July 28, 2021, 12:42 p.m. UTC | #1
On Mon 12-07-21 16:09:30, Feng Tang wrote:
> The semantics of MPOL_PREFERRED_MANY is similar to MPOL_PREFERRED,
> that it will first try to allocate memory from the preferred node(s),
> and fallback to all nodes in system when first try fails.
> 
> Add a dedicated function for it just like 'interleave' policy.
> 
> Link: https://lore.kernel.org/r/20200630212517.308045-9-ben.widawsky@intel.com
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Co-developed-by: Ben Widawsky <ben.widawsky@intel.com>
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> Signed-off-by: Feng Tang <feng.tang@intel.com>

It would be better to squash this together with the actual user of the
function added by the next patch.

> ---
>  mm/mempolicy.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 17b5800b7dcc..d17bf018efcc 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2153,6 +2153,25 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order,
>  	return page;
>  }
>  
> +static struct page *alloc_page_preferred_many(gfp_t gfp, unsigned int order,
> +						struct mempolicy *pol)

We likely want a node parameter to know which one we want to start with
for locality. Callers should use policy_node for that.

> +{
> +	struct page *page;
> +
> +	/*
> +	 * This is a two pass approach. The first pass will only try the
> +	 * preferred nodes but skip the direct reclaim and allow the
> +	 * allocation to fail, while the second pass will try all the
> +	 * nodes in system.
> +	 */
> +	page = __alloc_pages(((gfp | __GFP_NOWARN) & ~__GFP_DIRECT_RECLAIM),
> +				order, first_node(pol->nodes), &pol->nodes);

Although most users will likely have some form of GFP_*USER* here and
clearing __GFP_DIRECT_RECLAIM will put all other reclaim modifiers out
of game I think it would be better to explicitly disable some of them to
prevent from surprises. E.g. any potential __GFP_NOFAIL would be more
than surprising here. We do not have any (hopefully) but this should be
pretty cheap to exclude as we already have to modify already.

	preferred_gfp = gfp | __GFP_NOWARN;
	preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL)


> +	if (!page)
> +		page = __alloc_pages(gfp, order, numa_node_id(), NULL);
> +
> +	return page;
> +}
> +
>  /**
>   * alloc_pages_vma - Allocate a page for a VMA.
>   * @gfp: GFP flags.
> -- 
> 2.7.4
Feng Tang July 28, 2021, 3:18 p.m. UTC | #2
On Wed, Jul 28, 2021 at 02:42:26PM +0200, Michal Hocko wrote:
> On Mon 12-07-21 16:09:30, Feng Tang wrote:
> > The semantics of MPOL_PREFERRED_MANY is similar to MPOL_PREFERRED,
> > that it will first try to allocate memory from the preferred node(s),
> > and fallback to all nodes in system when first try fails.
> > 
> > Add a dedicated function for it just like 'interleave' policy.
> > 
> > Link: https://lore.kernel.org/r/20200630212517.308045-9-ben.widawsky@intel.com
> > Suggested-by: Michal Hocko <mhocko@suse.com>
> > Co-developed-by: Ben Widawsky <ben.widawsky@intel.com>
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> 
> It would be better to squash this together with the actual user of the
> function added by the next patch.
 
Ok, will do

> > ---
> >  mm/mempolicy.c | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 17b5800b7dcc..d17bf018efcc 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -2153,6 +2153,25 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order,
> >  	return page;
> >  }
> >  
> > +static struct page *alloc_page_preferred_many(gfp_t gfp, unsigned int order,
> > +						struct mempolicy *pol)
> 
> We likely want a node parameter to know which one we want to start with
> for locality. Callers should use policy_node for that.
 
Yes, locality should be considered, something like this?

	int pnid, lnid = numa_node_id();

	if (is_nodeset(lnid, &pol->nodes))
		pnid = local_nid;
	else
		pnid = first_node(pol->nodes);

	page = __alloc_pages(((gfp | __GFP_NOWARN) & ~__GFP_DIRECT_RECLAIM),
				order, pnid, &pol->nodes);
	if (!page)
		page = __alloc_pages(gfp, order, lnid, NULL);
	return page;


> > +{
> > +	struct page *page;
> > +
> > +	/*
> > +	 * This is a two pass approach. The first pass will only try the
> > +	 * preferred nodes but skip the direct reclaim and allow the
> > +	 * allocation to fail, while the second pass will try all the
> > +	 * nodes in system.
> > +	 */
> > +	page = __alloc_pages(((gfp | __GFP_NOWARN) & ~__GFP_DIRECT_RECLAIM),
> > +				order, first_node(pol->nodes), &pol->nodes);
> 
> Although most users will likely have some form of GFP_*USER* here and
> clearing __GFP_DIRECT_RECLAIM will put all other reclaim modifiers out
> of game I think it would be better to explicitly disable some of them to
> prevent from surprises. E.g. any potential __GFP_NOFAIL would be more
> than surprising here. We do not have any (hopefully) but this should be
> pretty cheap to exclude as we already have to modify already.
> 
> 	preferred_gfp = gfp | __GFP_NOWARN;
> 	preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL)

OK, will add.

Thanks,
Feng

> 
> > +	if (!page)
> > +		page = __alloc_pages(gfp, order, numa_node_id(), NULL);
> > +
> > +	return page;
> > +}
> > +
> >  /**
> >   * alloc_pages_vma - Allocate a page for a VMA.
> >   * @gfp: GFP flags.
> > -- 
> > 2.7.4
> 
> -- 
> Michal Hocko
> SUSE Labs
Feng Tang July 28, 2021, 3:25 p.m. UTC | #3
On Wed, Jul 28, 2021 at 11:18:10PM +0800, Tang, Feng wrote:
> On Wed, Jul 28, 2021 at 02:42:26PM +0200, Michal Hocko wrote:
> > On Mon 12-07-21 16:09:30, Feng Tang wrote:
> > > The semantics of MPOL_PREFERRED_MANY is similar to MPOL_PREFERRED,
> > > that it will first try to allocate memory from the preferred node(s),
> > > and fallback to all nodes in system when first try fails.
> > > 
> > > Add a dedicated function for it just like 'interleave' policy.
> > > 
> > > Link: https://lore.kernel.org/r/20200630212517.308045-9-ben.widawsky@intel.com
> > > Suggested-by: Michal Hocko <mhocko@suse.com>
> > > Co-developed-by: Ben Widawsky <ben.widawsky@intel.com>
> > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > 
> > It would be better to squash this together with the actual user of the
> > function added by the next patch.
>  
> Ok, will do
> 
> > > ---
> > >  mm/mempolicy.c | 19 +++++++++++++++++++
> > >  1 file changed, 19 insertions(+)
> > > 
> > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > > index 17b5800b7dcc..d17bf018efcc 100644
> > > --- a/mm/mempolicy.c
> > > +++ b/mm/mempolicy.c
> > > @@ -2153,6 +2153,25 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order,
> > >  	return page;
> > >  }
> > >  
> > > +static struct page *alloc_page_preferred_many(gfp_t gfp, unsigned int order,
> > > +						struct mempolicy *pol)
> > 
> > We likely want a node parameter to know which one we want to start with
> > for locality. Callers should use policy_node for that.
>  
> Yes, locality should be considered, something like this?
> 
> 	int pnid, lnid = numa_node_id();
> 
> 	if (is_nodeset(lnid, &pol->nodes))
> 		pnid = local_nid;
> 	else
> 		pnid = first_node(pol->nodes);

One further thought is, if local node is not in the nodemask,
should we compare the distance of all the nodes in nodemask
to the local node and chose the shortest? 

Thanks,
Feng

> 	page = __alloc_pages(((gfp | __GFP_NOWARN) & ~__GFP_DIRECT_RECLAIM),
> 				order, pnid, &pol->nodes);
> 	if (!page)
> 		page = __alloc_pages(gfp, order, lnid, NULL);
> 	return page;
>
Michal Hocko July 28, 2021, 4:14 p.m. UTC | #4
On Wed 28-07-21 23:18:10, Feng Tang wrote:
> On Wed, Jul 28, 2021 at 02:42:26PM +0200, Michal Hocko wrote:
> > On Mon 12-07-21 16:09:30, Feng Tang wrote:
> > > The semantics of MPOL_PREFERRED_MANY is similar to MPOL_PREFERRED,
> > > that it will first try to allocate memory from the preferred node(s),
> > > and fallback to all nodes in system when first try fails.
> > > 
> > > Add a dedicated function for it just like 'interleave' policy.
> > > 
> > > Link: https://lore.kernel.org/r/20200630212517.308045-9-ben.widawsky@intel.com
> > > Suggested-by: Michal Hocko <mhocko@suse.com>
> > > Co-developed-by: Ben Widawsky <ben.widawsky@intel.com>
> > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > 
> > It would be better to squash this together with the actual user of the
> > function added by the next patch.
>  
> Ok, will do
> 
> > > ---
> > >  mm/mempolicy.c | 19 +++++++++++++++++++
> > >  1 file changed, 19 insertions(+)
> > > 
> > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > > index 17b5800b7dcc..d17bf018efcc 100644
> > > --- a/mm/mempolicy.c
> > > +++ b/mm/mempolicy.c
> > > @@ -2153,6 +2153,25 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order,
> > >  	return page;
> > >  }
> > >  
> > > +static struct page *alloc_page_preferred_many(gfp_t gfp, unsigned int order,
> > > +						struct mempolicy *pol)
> > 
> > We likely want a node parameter to know which one we want to start with
> > for locality. Callers should use policy_node for that.
>  
> Yes, locality should be considered, something like this?
> 
> 	int pnid, lnid = numa_node_id();
> 
> 	if (is_nodeset(lnid, &pol->nodes))
> 		pnid = local_nid;
> 	else
> 		pnid = first_node(pol->nodes);
> 
> 	page = __alloc_pages(((gfp | __GFP_NOWARN) & ~__GFP_DIRECT_RECLAIM),
> 				order, pnid, &pol->nodes);
> 	if (!page)
> 		page = __alloc_pages(gfp, order, lnid, NULL);
> 	return page;

No. I really meant to get a node argument and use it as it is. Your
callers already have some node preferences. Usually a local node and as
we have a nodemask here then we do not really need to have any special
logic here as mentioned in other email. The preferred node will act only
as a source for the zone list.
Michal Hocko July 28, 2021, 4:15 p.m. UTC | #5
On Wed 28-07-21 23:25:07, Feng Tang wrote:
> On Wed, Jul 28, 2021 at 11:18:10PM +0800, Tang, Feng wrote:
> > On Wed, Jul 28, 2021 at 02:42:26PM +0200, Michal Hocko wrote:
> > > On Mon 12-07-21 16:09:30, Feng Tang wrote:
> > > > The semantics of MPOL_PREFERRED_MANY is similar to MPOL_PREFERRED,
> > > > that it will first try to allocate memory from the preferred node(s),
> > > > and fallback to all nodes in system when first try fails.
> > > > 
> > > > Add a dedicated function for it just like 'interleave' policy.
> > > > 
> > > > Link: https://lore.kernel.org/r/20200630212517.308045-9-ben.widawsky@intel.com
> > > > Suggested-by: Michal Hocko <mhocko@suse.com>
> > > > Co-developed-by: Ben Widawsky <ben.widawsky@intel.com>
> > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > > 
> > > It would be better to squash this together with the actual user of the
> > > function added by the next patch.
> >  
> > Ok, will do
> > 
> > > > ---
> > > >  mm/mempolicy.c | 19 +++++++++++++++++++
> > > >  1 file changed, 19 insertions(+)
> > > > 
> > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > > > index 17b5800b7dcc..d17bf018efcc 100644
> > > > --- a/mm/mempolicy.c
> > > > +++ b/mm/mempolicy.c
> > > > @@ -2153,6 +2153,25 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order,
> > > >  	return page;
> > > >  }
> > > >  
> > > > +static struct page *alloc_page_preferred_many(gfp_t gfp, unsigned int order,
> > > > +						struct mempolicy *pol)
> > > 
> > > We likely want a node parameter to know which one we want to start with
> > > for locality. Callers should use policy_node for that.
> >  
> > Yes, locality should be considered, something like this?
> > 
> > 	int pnid, lnid = numa_node_id();
> > 
> > 	if (is_nodeset(lnid, &pol->nodes))
> > 		pnid = local_nid;
> > 	else
> > 		pnid = first_node(pol->nodes);
> 
> One further thought is, if local node is not in the nodemask,
> should we compare the distance of all the nodes in nodemask
> to the local node and chose the shortest? 

Nope, That is zonelist for. Nodemask will do the rest.
diff mbox series

Patch

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 17b5800b7dcc..d17bf018efcc 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2153,6 +2153,25 @@  static struct page *alloc_page_interleave(gfp_t gfp, unsigned order,
 	return page;
 }
 
+static struct page *alloc_page_preferred_many(gfp_t gfp, unsigned int order,
+						struct mempolicy *pol)
+{
+	struct page *page;
+
+	/*
+	 * This is a two pass approach. The first pass will only try the
+	 * preferred nodes but skip the direct reclaim and allow the
+	 * allocation to fail, while the second pass will try all the
+	 * nodes in system.
+	 */
+	page = __alloc_pages(((gfp | __GFP_NOWARN) & ~__GFP_DIRECT_RECLAIM),
+				order, first_node(pol->nodes), &pol->nodes);
+	if (!page)
+		page = __alloc_pages(gfp, order, numa_node_id(), NULL);
+
+	return page;
+}
+
 /**
  * alloc_pages_vma - Allocate a page for a VMA.
  * @gfp: GFP flags.