Message ID | 20240625022342.6158-1-qirui.001@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/memory-failure: allow memory allocation from emergency reserves | expand |
On Tue, 25 Jun 2024 10:23:42 +0800 Rui Qi <qirui.001@bytedance.com> wrote: > we hope that memory errors can be successfully handled quickly, using > __GFP_MEMALLOC can help us improve the success rate of processing > under memory pressure, because to_kill struct is freed very quickly, > so using __GFP_MEMALLOC will not exacerbate memory pressure for a long time, > and more memory will be freed after killed task exiting, which will also > reduce memory pressure. Has this been observed to improve behavior in some situation? In other words, please fully describe the observed runtime effects of this change.
On 2024/6/25 10:23, Rui Qi wrote: > From: Rui Qi <qirui.001@bytedance.com> > > we hope that memory errors can be successfully handled quickly, using > __GFP_MEMALLOC can help us improve the success rate of processing Comments of __GFP_MEMALLOC says: * Users of this flag have to be extremely careful to not deplete the reserve * completely and implement a throttling mechanism which controls the * consumption of the reserve based on the amount of freed memory. It seems there's no such throttling mechanism in memory_failure. > under memory pressure, because to_kill struct is freed very quickly, > so using __GFP_MEMALLOC will not exacerbate memory pressure for a long time, > and more memory will be freed after killed task exiting, which will also Tasks might not be killed even to_kill struct is allocated. > reduce memory pressure. > > Signed-off-by: Rui Qi <qirui.001@bytedance.com> > --- > mm/memory-failure.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 05818d09b4eb..0608383f927a 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -451,7 +451,7 @@ static void __add_to_kill(struct task_struct *tsk, struct page *p, > { > struct to_kill *tk; > > - tk = kmalloc(sizeof(struct to_kill), GFP_ATOMIC); > + tk = kmalloc(sizeof(struct to_kill), GFP_ATOMIC | __GFP_MEMALLOC); > if (!tk) { > pr_err("Out of memory while machine check handling\n"); > return; > @@ -1931,7 +1931,7 @@ static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page) > return -EHWPOISON; > } > > - raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC); > + raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC | __GFP_MEMALLOC); In already hardware poisoned code path, raw_hwp can be allocated to store raw page info without killing anything. So __GFP_MEMALLOC might not be suitable to use. Or am I miss something? Thanks. .
On Sat, 29 Jun 2024 10:09:46 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote: > On 2024/6/25 10:23, Rui Qi wrote: > > From: Rui Qi <qirui.001@bytedance.com> > > > > we hope that memory errors can be successfully handled quickly, using > > __GFP_MEMALLOC can help us improve the success rate of processing > > Comments of __GFP_MEMALLOC says: > > * Users of this flag have to be extremely careful to not deplete the reserve > * completely and implement a throttling mechanism which controls the > * consumption of the reserve based on the amount of freed memory. > > It seems there's no such throttling mechanism in memory_failure. > > > under memory pressure, because to_kill struct is freed very quickly, > > so using __GFP_MEMALLOC will not exacerbate memory pressure for a long time, > > and more memory will be freed after killed task exiting, which will also > > Tasks might not be killed even to_kill struct is allocated. > > ... > > > - raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC); > > + raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC | __GFP_MEMALLOC); > > In already hardware poisoned code path, raw_hwp can be allocated to store raw page info > without killing anything. So __GFP_MEMALLOC might not be suitable to use. > Or am I miss something? Yes, I'm doubtful about this patch. I think that rather than poking at a particular implementation, it would be helpful for us to see a complete description of the issues which were observed, please. Let's see the bug report and we can discuss fixes later.
On 2024/7/2 15:19, Andrew Morton wrote: > On Sat, 29 Jun 2024 10:09:46 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote: > >> On 2024/6/25 10:23, Rui Qi wrote: >>> From: Rui Qi <qirui.001@bytedance.com> >>> >>> we hope that memory errors can be successfully handled quickly, using >>> __GFP_MEMALLOC can help us improve the success rate of processing >> >> Comments of __GFP_MEMALLOC says: >> >> * Users of this flag have to be extremely careful to not deplete the reserve >> * completely and implement a throttling mechanism which controls the >> * consumption of the reserve based on the amount of freed memory. >> >> It seems there's no such throttling mechanism in memory_failure. >> >>> under memory pressure, because to_kill struct is freed very quickly, >>> so using __GFP_MEMALLOC will not exacerbate memory pressure for a long time, >>> and more memory will be freed after killed task exiting, which will also >> >> Tasks might not be killed even to_kill struct is allocated. >> >> ... >> >>> - raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC); >>> + raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC | __GFP_MEMALLOC); >> >> In already hardware poisoned code path, raw_hwp can be allocated to store raw page info >> without killing anything. So __GFP_MEMALLOC might not be suitable to use. >> Or am I miss something? > > Yes, I'm doubtful about this patch. I think that rather than poking at a > particular implementation, it would be helpful for us to see a complete > description of the issues which were observed, please. Let's see the > bug report and we can discuss fixes later. I agree with you, Andrew. Thanks. :) .
On Tue, 2 Jul 2024 16:04:02 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote: > >> Tasks might not be killed even to_kill struct is allocated. > >> > >> ... > >> > >>> - raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC); > >>> + raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC | __GFP_MEMALLOC); > >> > >> In already hardware poisoned code path, raw_hwp can be allocated to store raw page info > >> without killing anything. So __GFP_MEMALLOC might not be suitable to use. > >> Or am I miss something? > > > > Yes, I'm doubtful about this patch. I think that rather than poking at a > > particular implementation, it would be helpful for us to see a complete > > description of the issues which were observed, please. Let's see the > > bug report and we can discuss fixes later. > > I agree with you, Andrew. Thanks. :) I dropped the patch. Please let's proceed as discussed above.
On 2024/7/5 7:26, Andrew Morton wrote: > On Tue, 2 Jul 2024 16:04:02 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote: > >>>> Tasks might not be killed even to_kill struct is allocated. >>>> >>>> ... >>>> >>>>> - raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC); >>>>> + raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC | __GFP_MEMALLOC); >>>> >>>> In already hardware poisoned code path, raw_hwp can be allocated to store raw page info >>>> without killing anything. So __GFP_MEMALLOC might not be suitable to use. >>>> Or am I miss something? >>> >>> Yes, I'm doubtful about this patch. I think that rather than poking at a >>> particular implementation, it would be helpful for us to see a complete >>> description of the issues which were observed, please. Let's see the >>> bug report and we can discuss fixes later. >> >> I agree with you, Andrew. Thanks. :) > > I dropped the patch. Please let's proceed as discussed above. > . Sure. Thanks. .
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 05818d09b4eb..0608383f927a 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -451,7 +451,7 @@ static void __add_to_kill(struct task_struct *tsk, struct page *p, { struct to_kill *tk; - tk = kmalloc(sizeof(struct to_kill), GFP_ATOMIC); + tk = kmalloc(sizeof(struct to_kill), GFP_ATOMIC | __GFP_MEMALLOC); if (!tk) { pr_err("Out of memory while machine check handling\n"); return; @@ -1931,7 +1931,7 @@ static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page) return -EHWPOISON; } - raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC); + raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC | __GFP_MEMALLOC); if (raw_hwp) { raw_hwp->page = page; llist_add(&raw_hwp->node, head);