Message ID | 20250212025707.67009-1-chenridong@huaweicloud.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/oom_kill: revert watchdog reset in global OOM process | expand |
On 2025/2/12 10:57, Chen Ridong wrote: > From: Chen Ridong <chenridong@huawei.com> > > Unlike memcg OOM, which is relatively common, global OOM events are rare > and typically indicate that the entire system is under severe memory > pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM > process") added the touch_softlockup_watchdog in the global OOM handler to > suppess the soft lockup issues. However, while this change can suppress > soft lockup warnings, it does not address RCU stalls, which can still be > detected and may cause unnecessary disturbances. Simply remove the > modification from the global OOM handler. > > Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process") > Signed-off-by: Chen Ridong <chenridong@huawei.com> > --- > mm/oom_kill.c | 8 +------- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 25923cfec9c6..2d8b27604ef8 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -44,7 +44,6 @@ > #include <linux/init.h> > #include <linux/mmu_notifier.h> > #include <linux/cred.h> > -#include <linux/nmi.h> > > #include <asm/tlb.h> > #include "internal.h" > @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc) > mem_cgroup_scan_tasks(oc->memcg, dump_task, oc); > else { > struct task_struct *p; > - int i = 0; > > rcu_read_lock(); > - for_each_process(p) { > - /* Avoid potential softlockup warning */ > - if ((++i & 1023) == 0) > - touch_softlockup_watchdog(); > + for_each_process(p) > dump_task(p, oc); > - } > rcu_read_unlock(); > } > } Add discussion link: https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
On Wed 12-02-25 02:57:07, Chen Ridong wrote: > From: Chen Ridong <chenridong@huawei.com> > > Unlike memcg OOM, which is relatively common, global OOM events are rare > and typically indicate that the entire system is under severe memory > pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM > process") added the touch_softlockup_watchdog in the global OOM handler to > suppess the soft lockup issues. However, while this change can suppress > soft lockup warnings, it does not address RCU stalls, which can still be > detected and may cause unnecessary disturbances. Simply remove the > modification from the global OOM handler. > > Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process") But this is not really fixing anything, is it? While this doesn't address a potential RCU stall it doesn't address any actual problem. So why do we want to do this? > Signed-off-by: Chen Ridong <chenridong@huawei.com> > --- > mm/oom_kill.c | 8 +------- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 25923cfec9c6..2d8b27604ef8 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -44,7 +44,6 @@ > #include <linux/init.h> > #include <linux/mmu_notifier.h> > #include <linux/cred.h> > -#include <linux/nmi.h> > > #include <asm/tlb.h> > #include "internal.h" > @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc) > mem_cgroup_scan_tasks(oc->memcg, dump_task, oc); > else { > struct task_struct *p; > - int i = 0; > > rcu_read_lock(); > - for_each_process(p) { > - /* Avoid potential softlockup warning */ > - if ((++i & 1023) == 0) > - touch_softlockup_watchdog(); > + for_each_process(p) > dump_task(p, oc); > - } > rcu_read_unlock(); > } > } > -- > 2.34.1
On 2025/2/12 16:57, Michal Hocko wrote: > On Wed 12-02-25 02:57:07, Chen Ridong wrote: >> From: Chen Ridong <chenridong@huawei.com> >> >> Unlike memcg OOM, which is relatively common, global OOM events are rare >> and typically indicate that the entire system is under severe memory >> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM >> process") added the touch_softlockup_watchdog in the global OOM handler to >> suppess the soft lockup issues. However, while this change can suppress >> soft lockup warnings, it does not address RCU stalls, which can still be >> detected and may cause unnecessary disturbances. Simply remove the >> modification from the global OOM handler. >> >> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process") > > But this is not really fixing anything, is it? While this doesn't > address a potential RCU stall it doesn't address any actual problem. > So why do we want to do this? > [1] https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/ As previously discussed, the work I have done on the global OOM is 'half of the job'. Based on our discussions, I thought that it would be best to abandon this approach for global OOM. Therefore, I am sending this patch to revert the changes. Or just leave it? Best regards, Ridong >> Signed-off-by: Chen Ridong <chenridong@huawei.com> >> --- >> mm/oom_kill.c | 8 +------- >> 1 file changed, 1 insertion(+), 7 deletions(-) >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c >> index 25923cfec9c6..2d8b27604ef8 100644 >> --- a/mm/oom_kill.c >> +++ b/mm/oom_kill.c >> @@ -44,7 +44,6 @@ >> #include <linux/init.h> >> #include <linux/mmu_notifier.h> >> #include <linux/cred.h> >> -#include <linux/nmi.h> >> >> #include <asm/tlb.h> >> #include "internal.h" >> @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc) >> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc); >> else { >> struct task_struct *p; >> - int i = 0; >> >> rcu_read_lock(); >> - for_each_process(p) { >> - /* Avoid potential softlockup warning */ >> - if ((++i & 1023) == 0) >> - touch_softlockup_watchdog(); >> + for_each_process(p) >> dump_task(p, oc); >> - } >> rcu_read_unlock(); >> } >> } >> -- >> 2.34.1 >
On 2/12/25 10:19, Chen Ridong wrote: > > > On 2025/2/12 16:57, Michal Hocko wrote: >> On Wed 12-02-25 02:57:07, Chen Ridong wrote: >>> From: Chen Ridong <chenridong@huawei.com> >>> >>> Unlike memcg OOM, which is relatively common, global OOM events are rare >>> and typically indicate that the entire system is under severe memory >>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM >>> process") added the touch_softlockup_watchdog in the global OOM handler to >>> suppess the soft lockup issues. However, while this change can suppress >>> soft lockup warnings, it does not address RCU stalls, which can still be >>> detected and may cause unnecessary disturbances. Simply remove the >>> modification from the global OOM handler. >>> >>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process") >> >> But this is not really fixing anything, is it? While this doesn't >> address a potential RCU stall it doesn't address any actual problem. >> So why do we want to do this? >> > > > [1] > https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/ > > As previously discussed, the work I have done on the global OOM is 'half > of the job'. Based on our discussions, I thought that it would be best > to abandon this approach for global OOM. Therefore, I am sending this > patch to revert the changes. > > Or just leave it? I suggested that part doesn't need to be in the patch, but if it was merged with it, we can just leave it there. Thanks.
On 2025/2/12 17:34, Vlastimil Babka wrote: > On 2/12/25 10:19, Chen Ridong wrote: >> >> >> On 2025/2/12 16:57, Michal Hocko wrote: >>> On Wed 12-02-25 02:57:07, Chen Ridong wrote: >>>> From: Chen Ridong <chenridong@huawei.com> >>>> >>>> Unlike memcg OOM, which is relatively common, global OOM events are rare >>>> and typically indicate that the entire system is under severe memory >>>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM >>>> process") added the touch_softlockup_watchdog in the global OOM handler to >>>> suppess the soft lockup issues. However, while this change can suppress >>>> soft lockup warnings, it does not address RCU stalls, which can still be >>>> detected and may cause unnecessary disturbances. Simply remove the >>>> modification from the global OOM handler. >>>> >>>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process") >>> >>> But this is not really fixing anything, is it? While this doesn't >>> address a potential RCU stall it doesn't address any actual problem. >>> So why do we want to do this? >>> >> >> >> [1] >> https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/ >> >> As previously discussed, the work I have done on the global OOM is 'half >> of the job'. Based on our discussions, I thought that it would be best >> to abandon this approach for global OOM. Therefore, I am sending this >> patch to revert the changes. >> >> Or just leave it? > > I suggested that part doesn't need to be in the patch, but if it was merged > with it, we can just leave it there. Thanks. See. Thank you very much. Best regards, Ridong
On Wed 12-02-25 10:34:06, Vlastimil Babka wrote: > On 2/12/25 10:19, Chen Ridong wrote: > > > > > > On 2025/2/12 16:57, Michal Hocko wrote: > >> On Wed 12-02-25 02:57:07, Chen Ridong wrote: > >>> From: Chen Ridong <chenridong@huawei.com> > >>> > >>> Unlike memcg OOM, which is relatively common, global OOM events are rare > >>> and typically indicate that the entire system is under severe memory > >>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM > >>> process") added the touch_softlockup_watchdog in the global OOM handler to > >>> suppess the soft lockup issues. However, while this change can suppress > >>> soft lockup warnings, it does not address RCU stalls, which can still be > >>> detected and may cause unnecessary disturbances. Simply remove the > >>> modification from the global OOM handler. > >>> > >>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process") > >> > >> But this is not really fixing anything, is it? While this doesn't > >> address a potential RCU stall it doesn't address any actual problem. > >> So why do we want to do this? > >> > > > > > > [1] > > https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/ > > > > As previously discussed, the work I have done on the global OOM is 'half > > of the job'. Based on our discussions, I thought that it would be best > > to abandon this approach for global OOM. Therefore, I am sending this > > patch to revert the changes. > > > > Or just leave it? > > I suggested that part doesn't need to be in the patch, but if it was merged > with it, we can just leave it there. Thanks. Agreed!
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 25923cfec9c6..2d8b27604ef8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -44,7 +44,6 @@ #include <linux/init.h> #include <linux/mmu_notifier.h> #include <linux/cred.h> -#include <linux/nmi.h> #include <asm/tlb.h> #include "internal.h" @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc) mem_cgroup_scan_tasks(oc->memcg, dump_task, oc); else { struct task_struct *p; - int i = 0; rcu_read_lock(); - for_each_process(p) { - /* Avoid potential softlockup warning */ - if ((++i & 1023) == 0) - touch_softlockup_watchdog(); + for_each_process(p) dump_task(p, oc); - } rcu_read_unlock(); } }