diff mbox series

mm/mempolicy: Fix redundant check and refine lock protection scope in init_nodemask_of_mempolicy

Message ID 20241121122409.277927-1-zhen.ni@easystack.cn (mailing list archive)
State New
Headers show
Series mm/mempolicy: Fix redundant check and refine lock protection scope in init_nodemask_of_mempolicy | expand

Commit Message

Zhen Ni Nov. 21, 2024, 12:24 p.m. UTC
1.Removing redundant checks for current->mempolicy, with a more concise
check order.

2.Using READ_ONCE(current->mempolicy) for safe, single access to
current->mempolicy to prevent potential race conditions.

3.Optimizing the scope of task_lock(current). The lock now only protects
the critical section where mempolicy is accessed, reducing the duration
the lock is held. This enhances performance by limiting the scope of the
lock to only what is necessary.

Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
 mm/mempolicy.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

David Hildenbrand Nov. 21, 2024, 1:40 p.m. UTC | #1
On 21.11.24 13:24, Zhen Ni wrote:
> 1.Removing redundant checks for current->mempolicy, with a more concise
> check order.
> 
> 2.Using READ_ONCE(current->mempolicy) for safe, single access to
> current->mempolicy to prevent potential race conditions.
> 
> 3.Optimizing the scope of task_lock(current). The lock now only protects
> the critical section where mempolicy is accessed, reducing the duration
> the lock is held. This enhances performance by limiting the scope of the
> lock to only what is necessary.
> 
> Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
> ---
>   mm/mempolicy.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index b646fab3e45e..8bff8830b7e6 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2132,11 +2132,14 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
>   {
>   	struct mempolicy *mempolicy;
>   
> -	if (!(mask && current->mempolicy))
 > +	if (!mask)> +		return false;
> +
> +	mempolicy = READ_ONCE(current->mempolicy);
> +	if (!mempolicy)
>   		return false;
>   
>   	task_lock(current);
> -	mempolicy = current->mempolicy;
>   	switch (mempolicy->mode) {
>   	case MPOL_PREFERRED:
>   	case MPOL_PREFERRED_MANY:

(a) current->mempolicy can only change under the task_lock(), see 
do_set_mempolicy().

(b) current->mempolicy cannot usually NULL once it was none-NULL. There 
are two exceptions:

copy_process() might set it to NULL when creating the process, before 
the task gets scheduled.

do_exit() might set it to NULL via mpol_put_task_policy().

I don't think init_nodemask_of_mempolicy() can be called while our task 
is exiting ...


Now, if you read current->mempolicy outside of task_lock() to the 
*dereference* it when it might already have changed+was freed, that's a 
BUG you're introducing.


So, regarding your (1): which redundant "check" ? There is a single "not 
NULL" chech. Reagrding your (2), I think you are introducing a BUG and 
regrding your (3) please share performance numbers, or realize that youa 
re making something up ;)


The only improvement we could make here is converting:
	if (!(mask && current->mempolicy))
into a more readable
	if (!mask || !current->mapping)

And maybe adding a comment that while we can race with someone changing 
current->mapping, we cannot race with someone changing it to NULL.
Gregory Price Nov. 21, 2024, 2:41 p.m. UTC | #2
On Thu, Nov 21, 2024 at 02:40:44PM +0100, David Hildenbrand wrote:
> On 21.11.24 13:24, Zhen Ni wrote:
> > 1.Removing redundant checks for current->mempolicy, with a more concise
> > check order.
> > 
> > 2.Using READ_ONCE(current->mempolicy) for safe, single access to
> > current->mempolicy to prevent potential race conditions.
> > 
> > 3.Optimizing the scope of task_lock(current). The lock now only protects
> > the critical section where mempolicy is accessed, reducing the duration
> > the lock is held. This enhances performance by limiting the scope of the
> > lock to only what is necessary.
> >

While optimizing task lock scopes is admirable, we're talking about a scale
of maybe a microsecond length access on a lock that's typically held quite
a bit longer - if it's ever really held given the context of this function.

What you'd actually want to do is the opposite:

task_lock(current);
mempolicy = get_mempolicy();
task_unlock(current);
/* do work */
put_mempolicy();

but even this is only optimizing a small amount of work.

And if you look at what init_nodemask_of_mempolicy is actually used for,
there is no need for this optimization - the task is almost certainly
short lived and intended to just adjust a system control value.

> > Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
> > ---
> >   mm/mempolicy.c | 7 +++++--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index b646fab3e45e..8bff8830b7e6 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -2132,11 +2132,14 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
> >   {
> >   	struct mempolicy *mempolicy;
> > -	if (!(mask && current->mempolicy))
> > +	if (!mask)> +		return false;
> > +
> > +	mempolicy = READ_ONCE(current->mempolicy);
> > +	if (!mempolicy)
> >   		return false;
> >   	task_lock(current);
> > -	mempolicy = current->mempolicy;
> >   	switch (mempolicy->mode) {
> >   	case MPOL_PREFERRED:
> >   	case MPOL_PREFERRED_MANY:
> 
> (a) current->mempolicy can only change under the task_lock(), see
> do_set_mempolicy().
> 
> (b) current->mempolicy cannot usually NULL once it was none-NULL. There are
> two exceptions:
> 
> copy_process() might set it to NULL when creating the process, before the
> task gets scheduled.
> 
> do_exit() might set it to NULL via mpol_put_task_policy().
> 
> I don't think init_nodemask_of_mempolicy() can be called while our task is
> exiting ...
>

Unless someone put an access to nr_hugepages_mempolicy in the exit path, I
doubt it (also that would probably deadlock :])
 
> 
> Now, if you read current->mempolicy outside of task_lock() to the
> *dereference* it when it might already have changed+was freed, that's a BUG
> you're introducing.
> 

100% a bug, although since the task mempolicy can't currently change outside
the context of the task - I can't think of a a way poke this particular bug.

~Gregory
diff mbox series

Patch

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index b646fab3e45e..8bff8830b7e6 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2132,11 +2132,14 @@  bool init_nodemask_of_mempolicy(nodemask_t *mask)
 {
 	struct mempolicy *mempolicy;
 
-	if (!(mask && current->mempolicy))
+	if (!mask)
+		return false;
+
+	mempolicy = READ_ONCE(current->mempolicy);
+	if (!mempolicy)
 		return false;
 
 	task_lock(current);
-	mempolicy = current->mempolicy;
 	switch (mempolicy->mode) {
 	case MPOL_PREFERRED:
 	case MPOL_PREFERRED_MANY: