diff mbox series

mm/oom_kill: revert watchdog reset in global OOM process

Message ID 20250212025707.67009-1-chenridong@huaweicloud.com (mailing list archive)
State New
Headers show
Series mm/oom_kill: revert watchdog reset in global OOM process | expand

Commit Message

Chen Ridong Feb. 12, 2025, 2:57 a.m. UTC
From: Chen Ridong <chenridong@huawei.com>

Unlike memcg OOM, which is relatively common, global OOM events are rare
and typically indicate that the entire system is under severe memory
pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
process") added the touch_softlockup_watchdog in the global OOM handler to
suppess the soft lockup issues. However, while this change can suppress
soft lockup warnings, it does not address RCU stalls, which can still be
detected and may cause unnecessary disturbances. Simply remove the
modification from the global OOM handler.

Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 mm/oom_kill.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

Comments

Chen Ridong Feb. 12, 2025, 3:24 a.m. UTC | #1
On 2025/2/12 10:57, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> Unlike memcg OOM, which is relatively common, global OOM events are rare
> and typically indicate that the entire system is under severe memory
> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
> process") added the touch_softlockup_watchdog in the global OOM handler to
> suppess the soft lockup issues. However, while this change can suppress
> soft lockup warnings, it does not address RCU stalls, which can still be
> detected and may cause unnecessary disturbances. Simply remove the
> modification from the global OOM handler.
> 
> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>  mm/oom_kill.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 25923cfec9c6..2d8b27604ef8 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -44,7 +44,6 @@
>  #include <linux/init.h>
>  #include <linux/mmu_notifier.h>
>  #include <linux/cred.h>
> -#include <linux/nmi.h>
>  
>  #include <asm/tlb.h>
>  #include "internal.h"
> @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>  	else {
>  		struct task_struct *p;
> -		int i = 0;
>  
>  		rcu_read_lock();
> -		for_each_process(p) {
> -			/* Avoid potential softlockup warning */
> -			if ((++i & 1023) == 0)
> -				touch_softlockup_watchdog();
> +		for_each_process(p)
>  			dump_task(p, oc);
> -		}
>  		rcu_read_unlock();
>  	}
>  }

Add discussion link:
https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
Michal Hocko Feb. 12, 2025, 8:57 a.m. UTC | #2
On Wed 12-02-25 02:57:07, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> Unlike memcg OOM, which is relatively common, global OOM events are rare
> and typically indicate that the entire system is under severe memory
> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
> process") added the touch_softlockup_watchdog in the global OOM handler to
> suppess the soft lockup issues. However, while this change can suppress
> soft lockup warnings, it does not address RCU stalls, which can still be
> detected and may cause unnecessary disturbances. Simply remove the
> modification from the global OOM handler.
> 
> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")

But this is not really fixing anything, is it? While this doesn't
address a potential RCU stall it doesn't address any actual problem.
So why do we want to do this?

> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>  mm/oom_kill.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 25923cfec9c6..2d8b27604ef8 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -44,7 +44,6 @@
>  #include <linux/init.h>
>  #include <linux/mmu_notifier.h>
>  #include <linux/cred.h>
> -#include <linux/nmi.h>
>  
>  #include <asm/tlb.h>
>  #include "internal.h"
> @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>  	else {
>  		struct task_struct *p;
> -		int i = 0;
>  
>  		rcu_read_lock();
> -		for_each_process(p) {
> -			/* Avoid potential softlockup warning */
> -			if ((++i & 1023) == 0)
> -				touch_softlockup_watchdog();
> +		for_each_process(p)
>  			dump_task(p, oc);
> -		}
>  		rcu_read_unlock();
>  	}
>  }
> -- 
> 2.34.1
Chen Ridong Feb. 12, 2025, 9:19 a.m. UTC | #3
On 2025/2/12 16:57, Michal Hocko wrote:
> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> Unlike memcg OOM, which is relatively common, global OOM events are rare
>> and typically indicate that the entire system is under severe memory
>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
>> process") added the touch_softlockup_watchdog in the global OOM handler to
>> suppess the soft lockup issues. However, while this change can suppress
>> soft lockup warnings, it does not address RCU stalls, which can still be
>> detected and may cause unnecessary disturbances. Simply remove the
>> modification from the global OOM handler.
>>
>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
> 
> But this is not really fixing anything, is it? While this doesn't
> address a potential RCU stall it doesn't address any actual problem.
> So why do we want to do this?
> 


[1]
https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/

As previously discussed, the work I have done on the global OOM is 'half
of the job'. Based on our discussions, I thought that it would be best
to abandon this approach for global OOM. Therefore, I am sending this
patch to revert the changes.

Or just leave it?

Best regards,
Ridong

>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>  mm/oom_kill.c | 8 +-------
>>  1 file changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 25923cfec9c6..2d8b27604ef8 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -44,7 +44,6 @@
>>  #include <linux/init.h>
>>  #include <linux/mmu_notifier.h>
>>  #include <linux/cred.h>
>> -#include <linux/nmi.h>
>>  
>>  #include <asm/tlb.h>
>>  #include "internal.h"
>> @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
>>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>>  	else {
>>  		struct task_struct *p;
>> -		int i = 0;
>>  
>>  		rcu_read_lock();
>> -		for_each_process(p) {
>> -			/* Avoid potential softlockup warning */
>> -			if ((++i & 1023) == 0)
>> -				touch_softlockup_watchdog();
>> +		for_each_process(p)
>>  			dump_task(p, oc);
>> -		}
>>  		rcu_read_unlock();
>>  	}
>>  }
>> -- 
>> 2.34.1
>
Vlastimil Babka Feb. 12, 2025, 9:34 a.m. UTC | #4
On 2/12/25 10:19, Chen Ridong wrote:
> 
> 
> On 2025/2/12 16:57, Michal Hocko wrote:
>> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
>>> From: Chen Ridong <chenridong@huawei.com>
>>>
>>> Unlike memcg OOM, which is relatively common, global OOM events are rare
>>> and typically indicate that the entire system is under severe memory
>>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
>>> process") added the touch_softlockup_watchdog in the global OOM handler to
>>> suppess the soft lockup issues. However, while this change can suppress
>>> soft lockup warnings, it does not address RCU stalls, which can still be
>>> detected and may cause unnecessary disturbances. Simply remove the
>>> modification from the global OOM handler.
>>>
>>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
>> 
>> But this is not really fixing anything, is it? While this doesn't
>> address a potential RCU stall it doesn't address any actual problem.
>> So why do we want to do this?
>> 
> 
> 
> [1]
> https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
> 
> As previously discussed, the work I have done on the global OOM is 'half
> of the job'. Based on our discussions, I thought that it would be best
> to abandon this approach for global OOM. Therefore, I am sending this
> patch to revert the changes.
> 
> Or just leave it?

I suggested that part doesn't need to be in the patch, but if it was merged
with it, we can just leave it there. Thanks.
Chen Ridong Feb. 12, 2025, 9:52 a.m. UTC | #5
On 2025/2/12 17:34, Vlastimil Babka wrote:
> On 2/12/25 10:19, Chen Ridong wrote:
>>
>>
>> On 2025/2/12 16:57, Michal Hocko wrote:
>>> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>
>>>> Unlike memcg OOM, which is relatively common, global OOM events are rare
>>>> and typically indicate that the entire system is under severe memory
>>>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
>>>> process") added the touch_softlockup_watchdog in the global OOM handler to
>>>> suppess the soft lockup issues. However, while this change can suppress
>>>> soft lockup warnings, it does not address RCU stalls, which can still be
>>>> detected and may cause unnecessary disturbances. Simply remove the
>>>> modification from the global OOM handler.
>>>>
>>>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
>>>
>>> But this is not really fixing anything, is it? While this doesn't
>>> address a potential RCU stall it doesn't address any actual problem.
>>> So why do we want to do this?
>>>
>>
>>
>> [1]
>> https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
>>
>> As previously discussed, the work I have done on the global OOM is 'half
>> of the job'. Based on our discussions, I thought that it would be best
>> to abandon this approach for global OOM. Therefore, I am sending this
>> patch to revert the changes.
>>
>> Or just leave it?
> 
> I suggested that part doesn't need to be in the patch, but if it was merged
> with it, we can just leave it there. Thanks.

See. Thank you very much.

Best regards,
Ridong
Michal Hocko Feb. 12, 2025, 11:58 a.m. UTC | #6
On Wed 12-02-25 10:34:06, Vlastimil Babka wrote:
> On 2/12/25 10:19, Chen Ridong wrote:
> > 
> > 
> > On 2025/2/12 16:57, Michal Hocko wrote:
> >> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
> >>> From: Chen Ridong <chenridong@huawei.com>
> >>>
> >>> Unlike memcg OOM, which is relatively common, global OOM events are rare
> >>> and typically indicate that the entire system is under severe memory
> >>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
> >>> process") added the touch_softlockup_watchdog in the global OOM handler to
> >>> suppess the soft lockup issues. However, while this change can suppress
> >>> soft lockup warnings, it does not address RCU stalls, which can still be
> >>> detected and may cause unnecessary disturbances. Simply remove the
> >>> modification from the global OOM handler.
> >>>
> >>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
> >> 
> >> But this is not really fixing anything, is it? While this doesn't
> >> address a potential RCU stall it doesn't address any actual problem.
> >> So why do we want to do this?
> >> 
> > 
> > 
> > [1]
> > https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
> > 
> > As previously discussed, the work I have done on the global OOM is 'half
> > of the job'. Based on our discussions, I thought that it would be best
> > to abandon this approach for global OOM. Therefore, I am sending this
> > patch to revert the changes.
> > 
> > Or just leave it?
> 
> I suggested that part doesn't need to be in the patch, but if it was merged
> with it, we can just leave it there. Thanks.

Agreed!
diff mbox series

Patch

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 25923cfec9c6..2d8b27604ef8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -44,7 +44,6 @@ 
 #include <linux/init.h>
 #include <linux/mmu_notifier.h>
 #include <linux/cred.h>
-#include <linux/nmi.h>
 
 #include <asm/tlb.h>
 #include "internal.h"
@@ -431,15 +430,10 @@  static void dump_tasks(struct oom_control *oc)
 		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
 	else {
 		struct task_struct *p;
-		int i = 0;
 
 		rcu_read_lock();
-		for_each_process(p) {
-			/* Avoid potential softlockup warning */
-			if ((++i & 1023) == 0)
-				touch_softlockup_watchdog();
+		for_each_process(p)
 			dump_task(p, oc);
-		}
 		rcu_read_unlock();
 	}
 }