mm,page_alloc: PF_WQ_WORKER should always sleep at should_reclaim_retry().
diff mbox

Message ID 1531046158-4010-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
State New
Headers show

Commit Message

Tetsuo Handa July 8, 2018, 10:35 a.m. UTC
From: Michal Hocko <mhocko@suse.com>

should_reclaim_retry() should be a natural reschedule point. PF_WQ_WORKER
is a special case which needs a stronger rescheduling policy. However,
since schedule_timeout_uninterruptible(1) for PF_WQ_WORKER depends on
__zone_watermark_ok() == true, PF_WQ_WORKER is currently counting on
mutex_trylock(&oom_lock) == 0 in __alloc_pages_may_oom() which is a bad
expectation.

Doing schedule_timeout_uninterruptible(1) at should_reclaim_retry()
unconditionally seems more straightforward than depending on a zone being
a good candidate for a further reclaim.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

Comments

Michal Hocko July 9, 2018, 7:57 a.m. UTC | #1
On Sun 08-07-18 19:35:58, Tetsuo Handa wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> should_reclaim_retry() should be a natural reschedule point. PF_WQ_WORKER
> is a special case which needs a stronger rescheduling policy. However,
> since schedule_timeout_uninterruptible(1) for PF_WQ_WORKER depends on
> __zone_watermark_ok() == true, PF_WQ_WORKER is currently counting on
> mutex_trylock(&oom_lock) == 0 in __alloc_pages_may_oom() which is a bad
> expectation.

I think your reference to the oom_lock is more confusing than helpful
actually. I would simply use the following from your previous [1]
changelog:
: should_reclaim_retry() should be a natural reschedule point. PF_WQ_WORKER
: is a special case which needs a stronger rescheduling policy. Doing that
: unconditionally seems more straightforward than depending on a zone being
: a good candidate for a further reclaim.
: 
: Thus, move the short sleep when we are waiting for the owner of oom_lock
: (which coincidentally also serves as a guaranteed sleep for PF_WQ_WORKER
: threads) to should_reclaim_retry().

> unconditionally seems more straightforward than depending on a zone being
> a good candidate for a further reclaim.

[1] http://lkml.kernel.org/r/1528369223-7571-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp

[Tetsuo: changelog]
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Joonsoo Kim <js1304@gmail.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>

Your s-o-b is still missing.

> ---
>  mm/page_alloc.c | 34 ++++++++++++++++++----------------
>  1 file changed, 18 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1521100..f56cc09 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3922,6 +3922,7 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
>  {
>  	struct zone *zone;
>  	struct zoneref *z;
> +	bool ret = false;
>  
>  	/*
>  	 * Costly allocations might have made a progress but this doesn't mean
> @@ -3985,25 +3986,26 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
>  				}
>  			}
>  
> -			/*
> -			 * Memory allocation/reclaim might be called from a WQ
> -			 * context and the current implementation of the WQ
> -			 * concurrency control doesn't recognize that
> -			 * a particular WQ is congested if the worker thread is
> -			 * looping without ever sleeping. Therefore we have to
> -			 * do a short sleep here rather than calling
> -			 * cond_resched().
> -			 */
> -			if (current->flags & PF_WQ_WORKER)
> -				schedule_timeout_uninterruptible(1);
> -			else
> -				cond_resched();
> -
> -			return true;
> +			ret = true;
> +			goto out;
>  		}
>  	}
>  
> -	return false;
> +out:
> +	/*
> +	 * Memory allocation/reclaim might be called from a WQ
> +	 * context and the current implementation of the WQ
> +	 * concurrency control doesn't recognize that
> +	 * a particular WQ is congested if the worker thread is
> +	 * looping without ever sleeping. Therefore we have to
> +	 * do a short sleep here rather than calling
> +	 * cond_resched().
> +	 */
> +	if (current->flags & PF_WQ_WORKER)
> +		schedule_timeout_uninterruptible(1);
> +	else
> +		cond_resched();
> +	return ret;
>  }
>  
>  static inline bool
> -- 
> 1.8.3.1
>
Tetsuo Handa July 9, 2018, 1:08 p.m. UTC | #2
On 2018/07/09 16:57, Michal Hocko wrote:
> On Sun 08-07-18 19:35:58, Tetsuo Handa wrote:
>> From: Michal Hocko <mhocko@suse.com>
>>
>> should_reclaim_retry() should be a natural reschedule point. PF_WQ_WORKER
>> is a special case which needs a stronger rescheduling policy. However,
>> since schedule_timeout_uninterruptible(1) for PF_WQ_WORKER depends on
>> __zone_watermark_ok() == true, PF_WQ_WORKER is currently counting on
>> mutex_trylock(&oom_lock) == 0 in __alloc_pages_may_oom() which is a bad
>> expectation.
> 
> I think your reference to the oom_lock is more confusing than helpful
> actually. I would simply use the following from your previous [1]
> changelog:

Then, you can post yourself because

> : should_reclaim_retry() should be a natural reschedule point. PF_WQ_WORKER
> : is a special case which needs a stronger rescheduling policy. Doing that
> : unconditionally seems more straightforward than depending on a zone being
> : a good candidate for a further reclaim.
> : 
> : Thus, move the short sleep when we are waiting for the owner of oom_lock
> : (which coincidentally also serves as a guaranteed sleep for PF_WQ_WORKER
> : threads) to should_reclaim_retry().
> 
>> unconditionally seems more straightforward than depending on a zone being
>> a good candidate for a further reclaim.
> 
> [1] http://lkml.kernel.org/r/1528369223-7571-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
> 
> [Tetsuo: changelog]
>> Signed-off-by: Michal Hocko <mhocko@suse.com>
>> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Joonsoo Kim <js1304@gmail.com>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
> 
> Your s-o-b is still missing.

all code changes in this patch is from you. That is, my s-o-b is not missing.
Michal Hocko July 9, 2018, 1:13 p.m. UTC | #3
On Mon 09-07-18 22:08:04, Tetsuo Handa wrote:
> On 2018/07/09 16:57, Michal Hocko wrote:
[...]
> > [Tetsuo: changelog]
> >> Signed-off-by: Michal Hocko <mhocko@suse.com>
> >> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
> >> Cc: David Rientjes <rientjes@google.com>
> >> Cc: Johannes Weiner <hannes@cmpxchg.org>
> >> Cc: Joonsoo Kim <js1304@gmail.com>
> >> Cc: Mel Gorman <mgorman@suse.de>
> >> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> >> Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
> >> Cc: Vlastimil Babka <vbabka@suse.cz>
> > 
> > Your s-o-b is still missing.
> 
> all code changes in this patch is from you. That is, my s-o-b is not missing.

You are supposed to add your s-o-b if you are reposting a patch as
non-author to record to sender path properly.
Matthew Wilcox July 9, 2018, 1:58 p.m. UTC | #4
On Mon, Jul 09, 2018 at 10:08:04PM +0900, Tetsuo Handa wrote:
> > [Tetsuo: changelog]
> >> Signed-off-by: Michal Hocko <mhocko@suse.com>
> >> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
> >> Cc: David Rientjes <rientjes@google.com>
> >> Cc: Johannes Weiner <hannes@cmpxchg.org>
> >> Cc: Joonsoo Kim <js1304@gmail.com>
> >> Cc: Mel Gorman <mgorman@suse.de>
> >> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> >> Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
> >> Cc: Vlastimil Babka <vbabka@suse.cz>
> > 
> > Your s-o-b is still missing.
> 
> all code changes in this patch is from you. That is, my s-o-b is not missing.

   12) When to use Acked-by:, Cc:, and Co-Developed-by:
   -------------------------------------------------------

   The Signed-off-by: tag indicates that the signer was involved in the
   development of the patch, or that he/she was in the patch's delivery path.

That is, if you're submitting it, it needs your S-o-b line.  That's
written down in Documentation/process/submitting-patches.rst.

Patch
diff mbox

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100..f56cc09 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3922,6 +3922,7 @@  bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
 {
 	struct zone *zone;
 	struct zoneref *z;
+	bool ret = false;
 
 	/*
 	 * Costly allocations might have made a progress but this doesn't mean
@@ -3985,25 +3986,26 @@  bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
 				}
 			}
 
-			/*
-			 * Memory allocation/reclaim might be called from a WQ
-			 * context and the current implementation of the WQ
-			 * concurrency control doesn't recognize that
-			 * a particular WQ is congested if the worker thread is
-			 * looping without ever sleeping. Therefore we have to
-			 * do a short sleep here rather than calling
-			 * cond_resched().
-			 */
-			if (current->flags & PF_WQ_WORKER)
-				schedule_timeout_uninterruptible(1);
-			else
-				cond_resched();
-
-			return true;
+			ret = true;
+			goto out;
 		}
 	}
 
-	return false;
+out:
+	/*
+	 * Memory allocation/reclaim might be called from a WQ
+	 * context and the current implementation of the WQ
+	 * concurrency control doesn't recognize that
+	 * a particular WQ is congested if the worker thread is
+	 * looping without ever sleeping. Therefore we have to
+	 * do a short sleep here rather than calling
+	 * cond_resched().
+	 */
+	if (current->flags & PF_WQ_WORKER)
+		schedule_timeout_uninterruptible(1);
+	else
+		cond_resched();
+	return ret;
 }
 
 static inline bool