diff mbox series

[resend] mm: compaction: optimize proactive compaction deferrals

Message ID 1626869599-25412-1-git-send-email-charante@codeaurora.org (mailing list archive)
State New
Headers show
Series [resend] mm: compaction: optimize proactive compaction deferrals | expand

Commit Message

Charan Teja Kalla July 21, 2021, 12:13 p.m. UTC
Vlastimil Babka figured out that when fragmentation score didn't go down
across the proactive compaction i.e. when no progress is made, next wake
up for proactive compaction is deferred for 1 <<
COMPACT_MAX_DEFER_SHIFT, i.e. 64 times, with each wakeup interval of
HPAGE_FRAG_CHECK_INTERVAL_MSEC(=500). In each of this wakeup, it just
decrement 'proactive_defer' counter and goes sleep i.e. it is getting
woken to just decrement a counter. The same deferral time can also
achieved by simply doing the HPAGE_FRAG_CHECK_INTERVAL_MSEC <<
COMPACT_MAX_DEFER_SHIFT thus unnecessary wakeup of kcompact thread is
avoided thus also removes the need of 'proactive_defer' thread counter.

Link: https://lore.kernel.org/linux-fsdevel/88abfdb6-2c13-b5a6-5b46-742d12d1c910@suse.cz/
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
---
 Changes in V1:
    o Removed the 'proactive_defer' thread counter by optimizing proactive
    o This is a resend as earlier it was clubbed with other changes posted
      at https://lore.kernel.org/patchwork/patch/1448789/	

 mm/compaction.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

Comments

Andrew Morton July 21, 2021, 8:18 p.m. UTC | #1
On Wed, 21 Jul 2021 17:43:19 +0530 Charan Teja Reddy <charante@codeaurora.org> wrote:

> Vlastimil Babka figured out that when fragmentation score didn't go down
> across the proactive compaction i.e. when no progress is made, next wake
> up for proactive compaction is deferred for 1 <<
> COMPACT_MAX_DEFER_SHIFT, i.e. 64 times, with each wakeup interval of
> HPAGE_FRAG_CHECK_INTERVAL_MSEC(=500). In each of this wakeup, it just
> decrement 'proactive_defer' counter and goes sleep i.e. it is getting
> woken to just decrement a counter. The same deferral time can also
> achieved by simply doing the HPAGE_FRAG_CHECK_INTERVAL_MSEC <<
> COMPACT_MAX_DEFER_SHIFT thus unnecessary wakeup of kcompact thread is
> avoided thus also removes the need of 'proactive_defer' thread counter.
> 
> @@ -2902,23 +2903,30 @@ static int kcompactd(void *p)
>  
>  		trace_mm_compaction_kcompactd_sleep(pgdat->node_id);
>  		if (wait_event_freezable_timeout(pgdat->kcompactd_wait,
> -			kcompactd_work_requested(pgdat),
> -			msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC))) {
> +			kcompactd_work_requested(pgdat), timeout)) {
>  
>  			psi_memstall_enter(&pflags);
>  			kcompactd_do_work(pgdat);
>  			psi_memstall_leave(&pflags);
> +			/*
> +			 * Reset the timeout value. The defer timeout by
> +			 * proactive compaction can effectively lost
> +			 * here but that is fine as the condition of the
> +			 * zone changed substantionally and carrying on
> +			 * with the previous defer is not useful.
> +			 */
> +			timeout = default_timeout;
>  			continue;

I find this comment hard to follow.  Is this better?

--- a/mm/compaction.c~mm-compaction-optimize-proactive-compaction-deferrals-fix
+++ a/mm/compaction.c
@@ -2909,11 +2909,11 @@ static int kcompactd(void *p)
 			kcompactd_do_work(pgdat);
 			psi_memstall_leave(&pflags);
 			/*
-			 * Reset the timeout value. The defer timeout by
-			 * proactive compaction can effectively lost
-			 * here but that is fine as the condition of the
-			 * zone changed substantionally and carrying on
-			 * with the previous defer is not useful.
+			 * Reset the timeout value. The defer timeout from
+			 * proactive compaction is lost here but that is fine
+			 * as the condition of the zone changing substantionally
+			 * then carrying on with the previous defer interval is
+			 * not useful.
 			 */
 			timeout = default_timeout;
 			continue;
Vlastimil Babka July 21, 2021, 9:29 p.m. UTC | #2
On 7/21/21 10:18 PM, Andrew Morton wrote:
> On Wed, 21 Jul 2021 17:43:19 +0530 Charan Teja Reddy <charante@codeaurora.org> wrote:
> 
>> Vlastimil Babka figured out that when fragmentation score didn't go down
>> across the proactive compaction i.e. when no progress is made, next wake
>> up for proactive compaction is deferred for 1 <<
>> COMPACT_MAX_DEFER_SHIFT, i.e. 64 times, with each wakeup interval of
>> HPAGE_FRAG_CHECK_INTERVAL_MSEC(=500). In each of this wakeup, it just
>> decrement 'proactive_defer' counter and goes sleep i.e. it is getting
>> woken to just decrement a counter. The same deferral time can also
>> achieved by simply doing the HPAGE_FRAG_CHECK_INTERVAL_MSEC <<
>> COMPACT_MAX_DEFER_SHIFT thus unnecessary wakeup of kcompact thread is
>> avoided thus also removes the need of 'proactive_defer' thread counter.

Acked-by: Vlastimil Babka <vbabka@suse.cz>

>>
>> @@ -2902,23 +2903,30 @@ static int kcompactd(void *p)
>>  
>>  		trace_mm_compaction_kcompactd_sleep(pgdat->node_id);
>>  		if (wait_event_freezable_timeout(pgdat->kcompactd_wait,
>> -			kcompactd_work_requested(pgdat),
>> -			msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC))) {
>> +			kcompactd_work_requested(pgdat), timeout)) {
>>  
>>  			psi_memstall_enter(&pflags);
>>  			kcompactd_do_work(pgdat);
>>  			psi_memstall_leave(&pflags);
>> +			/*
>> +			 * Reset the timeout value. The defer timeout by
>> +			 * proactive compaction can effectively lost
>> +			 * here but that is fine as the condition of the
>> +			 * zone changed substantionally and carrying on
>> +			 * with the previous defer is not useful.
>> +			 */
>> +			timeout = default_timeout;
>>  			continue;
> 
> I find this comment hard to follow.  Is this better?

Yes, thanks.

> --- a/mm/compaction.c~mm-compaction-optimize-proactive-compaction-deferrals-fix
> +++ a/mm/compaction.c
> @@ -2909,11 +2909,11 @@ static int kcompactd(void *p)
>  			kcompactd_do_work(pgdat);
>  			psi_memstall_leave(&pflags);
>  			/*
> -			 * Reset the timeout value. The defer timeout by
> -			 * proactive compaction can effectively lost
> -			 * here but that is fine as the condition of the
> -			 * zone changed substantionally and carrying on
> -			 * with the previous defer is not useful.
> +			 * Reset the timeout value. The defer timeout from
> +			 * proactive compaction is lost here but that is fine
> +			 * as the condition of the zone changing substantionally
> +			 * then carrying on with the previous defer interval is
> +			 * not useful.
>  			 */
>  			timeout = default_timeout;
>  			continue;
> _
>
Khalid Aziz July 21, 2021, 10:35 p.m. UTC | #3
On 7/21/21 6:13 AM, Charan Teja Reddy wrote:
> Vlastimil Babka figured out that when fragmentation score didn't go down
> across the proactive compaction i.e. when no progress is made, next wake
> up for proactive compaction is deferred for 1 <<
> COMPACT_MAX_DEFER_SHIFT, i.e. 64 times, with each wakeup interval of
> HPAGE_FRAG_CHECK_INTERVAL_MSEC(=500). In each of this wakeup, it just
> decrement 'proactive_defer' counter and goes sleep i.e. it is getting
> woken to just decrement a counter. The same deferral time can also
> achieved by simply doing the HPAGE_FRAG_CHECK_INTERVAL_MSEC <<
> COMPACT_MAX_DEFER_SHIFT thus unnecessary wakeup of kcompact thread is
> avoided thus also removes the need of 'proactive_defer' thread counter.
> 
> Link: https://lore.kernel.org/linux-fsdevel/88abfdb6-2c13-b5a6-5b46-742d12d1c910@suse.cz/
> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>


Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>


> ---
>   Changes in V1:
>      o Removed the 'proactive_defer' thread counter by optimizing proactive
>      o This is a resend as earlier it was clubbed with other changes posted
>        at https://lore.kernel.org/patchwork/patch/1448789/	
> 
>   mm/compaction.c | 29 +++++++++++++++++++----------
>   1 file changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 621508e..db00dbf 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2885,7 +2885,8 @@ static int kcompactd(void *p)
>   {
>   	pg_data_t *pgdat = (pg_data_t *)p;
>   	struct task_struct *tsk = current;
> -	unsigned int proactive_defer = 0;
> +	long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC);
> +	long timeout = default_timeout;
>   
>   	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
>   
> @@ -2902,23 +2903,30 @@ static int kcompactd(void *p)
>   
>   		trace_mm_compaction_kcompactd_sleep(pgdat->node_id);
>   		if (wait_event_freezable_timeout(pgdat->kcompactd_wait,
> -			kcompactd_work_requested(pgdat),
> -			msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC))) {
> +			kcompactd_work_requested(pgdat), timeout)) {
>   
>   			psi_memstall_enter(&pflags);
>   			kcompactd_do_work(pgdat);
>   			psi_memstall_leave(&pflags);
> +			/*
> +			 * Reset the timeout value. The defer timeout by
> +			 * proactive compaction can effectively lost
> +			 * here but that is fine as the condition of the
> +			 * zone changed substantionally and carrying on
> +			 * with the previous defer is not useful.
> +			 */
> +			timeout = default_timeout;
>   			continue;
>   		}
>   
> -		/* kcompactd wait timeout */
> +		/*
> +		 * Start the proactive work with default timeout. Based
> +		 * on the fragmentation score, this timeout is updated.
> +		 */
> +		timeout = default_timeout;
>   		if (should_proactive_compact_node(pgdat)) {
>   			unsigned int prev_score, score;
>   
> -			if (proactive_defer) {
> -				proactive_defer--;
> -				continue;
> -			}
>   			prev_score = fragmentation_score_node(pgdat);
>   			proactive_compact_node(pgdat);
>   			score = fragmentation_score_node(pgdat);
> @@ -2926,8 +2934,9 @@ static int kcompactd(void *p)
>   			 * Defer proactive compaction if the fragmentation
>   			 * score did not go down i.e. no progress made.
>   			 */
> -			proactive_defer = score < prev_score ?
> -					0 : 1 << COMPACT_MAX_DEFER_SHIFT;
> +			if (unlikely(score >= prev_score))
> +				timeout =
> +				   default_timeout << COMPACT_MAX_DEFER_SHIFT;
>   		}
>   	}
>   
>
David Rientjes July 26, 2021, 1:47 a.m. UTC | #4
On Wed, 21 Jul 2021, Charan Teja Reddy wrote:

> Vlastimil Babka figured out that when fragmentation score didn't go down
> across the proactive compaction i.e. when no progress is made, next wake
> up for proactive compaction is deferred for 1 <<
> COMPACT_MAX_DEFER_SHIFT, i.e. 64 times, with each wakeup interval of
> HPAGE_FRAG_CHECK_INTERVAL_MSEC(=500). In each of this wakeup, it just
> decrement 'proactive_defer' counter and goes sleep i.e. it is getting
> woken to just decrement a counter. The same deferral time can also
> achieved by simply doing the HPAGE_FRAG_CHECK_INTERVAL_MSEC <<
> COMPACT_MAX_DEFER_SHIFT thus unnecessary wakeup of kcompact thread is
> avoided thus also removes the need of 'proactive_defer' thread counter.
> 
> Link: https://lore.kernel.org/linux-fsdevel/88abfdb6-2c13-b5a6-5b46-742d12d1c910@suse.cz/
> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>

With Andrew's comment fixup:

Acked-by: David Rientjes <rientjes@google.com>

Thanks, Charan.
diff mbox series

Patch

diff --git a/mm/compaction.c b/mm/compaction.c
index 621508e..db00dbf 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2885,7 +2885,8 @@  static int kcompactd(void *p)
 {
 	pg_data_t *pgdat = (pg_data_t *)p;
 	struct task_struct *tsk = current;
-	unsigned int proactive_defer = 0;
+	long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC);
+	long timeout = default_timeout;
 
 	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
 
@@ -2902,23 +2903,30 @@  static int kcompactd(void *p)
 
 		trace_mm_compaction_kcompactd_sleep(pgdat->node_id);
 		if (wait_event_freezable_timeout(pgdat->kcompactd_wait,
-			kcompactd_work_requested(pgdat),
-			msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC))) {
+			kcompactd_work_requested(pgdat), timeout)) {
 
 			psi_memstall_enter(&pflags);
 			kcompactd_do_work(pgdat);
 			psi_memstall_leave(&pflags);
+			/*
+			 * Reset the timeout value. The defer timeout by
+			 * proactive compaction can effectively lost
+			 * here but that is fine as the condition of the
+			 * zone changed substantionally and carrying on
+			 * with the previous defer is not useful.
+			 */
+			timeout = default_timeout;
 			continue;
 		}
 
-		/* kcompactd wait timeout */
+		/*
+		 * Start the proactive work with default timeout. Based
+		 * on the fragmentation score, this timeout is updated.
+		 */
+		timeout = default_timeout;
 		if (should_proactive_compact_node(pgdat)) {
 			unsigned int prev_score, score;
 
-			if (proactive_defer) {
-				proactive_defer--;
-				continue;
-			}
 			prev_score = fragmentation_score_node(pgdat);
 			proactive_compact_node(pgdat);
 			score = fragmentation_score_node(pgdat);
@@ -2926,8 +2934,9 @@  static int kcompactd(void *p)
 			 * Defer proactive compaction if the fragmentation
 			 * score did not go down i.e. no progress made.
 			 */
-			proactive_defer = score < prev_score ?
-					0 : 1 << COMPACT_MAX_DEFER_SHIFT;
+			if (unlikely(score >= prev_score))
+				timeout =
+				   default_timeout << COMPACT_MAX_DEFER_SHIFT;
 		}
 	}