Message ID | 20240313234245.18824-1-osalvador@suse.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm,page_owner: Fix recursion | expand |
On Thu, Mar 14, 2024 at 12:42:45AM +0100, Oscar Salvador wrote: > @@ -232,6 +241,7 @@ void __reset_page_owner(struct page *page, unsigned short order) > alloc_handle = page_owner->handle; > > handle = save_stack(GFP_NOWAIT | __GFP_NOWARN); > + > for (i = 0; i < (1 << order); i++) { Sigh, a last-minute unnoticed change. @Andrew: Do you want me to send v2 fixing that up?
On 2024/03/14 8:42, Oscar Salvador wrote: > Prior to 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") > the only place where page_owner could potentially go into recursion due to > its need of allocating more memory was in save_stack(), which ends up calling > into stackdepot code with the possibility of allocating memory. > > We made sure to guard against that by signaling that the current task was > already in page_owner code, so in case a recursion attempt was made, we > could catch that and return dummy_handle. > > After above commit, a new place in page_owner code was introduced where we > could allocate memory, meaning we could go into recursion would we take that > path. > > Make sure to signal that we are in page_owner in that codepath as well. > Move the guard code into two helpers {un}set_current_in_page_owner() > and use them prior to calling in the two functions that might allocate > memory. > > Signed-off-by: Oscar Salvador <osalvador@suse.de> > Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") Maybe culprit for a page owner refcount bug reported at https://syzkaller.appspot.com/bug?id=8e4e66dfe299a2a00204ad220c641daaf1486a00 , for that commit went to next-20240214 and syzbot started failing to test since next-20240215 ? Please send this patch to linux-next.git as soon as possible (or can someone experiencing this bug try booting linux-next.git with this patch applied, so that we can check whether syzbot can resume testing linux-next.git), and then send to linux.git together (so that various trees which depend on linux.git won't start failing to boot).
On Thu, Mar 14, 2024 at 12:01:24PM +0900, Tetsuo Handa wrote: > Maybe culprit for a page owner refcount bug reported at > https://syzkaller.appspot.com/bug?id=8e4e66dfe299a2a00204ad220c641daaf1486a00 , for > that commit went to next-20240214 and syzbot started failing to test since next-20240215 ? > > Please send this patch to linux-next.git as soon as possible (or can someone experiencing > this bug try booting linux-next.git with this patch applied, so that we can check whether > syzbot can resume testing linux-next.git), and then send to linux.git together (so that > various trees which depend on linux.git won't start failing to boot). No, that is something else that I already started fixing a few days ago. I think I will have the fix ready today.
On Thu, Mar 14, 2024 at 06:47:43AM +0100, Oscar Salvador wrote: > On Thu, Mar 14, 2024 at 12:01:24PM +0900, Tetsuo Handa wrote: > > Maybe culprit for a page owner refcount bug reported at > > https://syzkaller.appspot.com/bug?id=8e4e66dfe299a2a00204ad220c641daaf1486a00 , for > > that commit went to next-20240214 and syzbot started failing to test since next-20240215 ? > > > > Please send this patch to linux-next.git as soon as possible (or can someone experiencing > > this bug try booting linux-next.git with this patch applied, so that we can check whether > > syzbot can resume testing linux-next.git), and then send to linux.git together (so that > > various trees which depend on linux.git won't start failing to boot). > > No, that is something else that I already started fixing a few days ago. > I think I will have the fix ready today. I already have the fix. I will do some more testing and then I will send it out. Thanks
On 3/14/24 00:42, Oscar Salvador wrote: > Prior to 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") > the only place where page_owner could potentially go into recursion due to > its need of allocating more memory was in save_stack(), which ends up calling > into stackdepot code with the possibility of allocating memory. > > We made sure to guard against that by signaling that the current task was > already in page_owner code, so in case a recursion attempt was made, we > could catch that and return dummy_handle. > > After above commit, a new place in page_owner code was introduced where we > could allocate memory, meaning we could go into recursion would we take that > path. > > Make sure to signal that we are in page_owner in that codepath as well. > Move the guard code into two helpers {un}set_current_in_page_owner() > and use them prior to calling in the two functions that might allocate > memory. > > Signed-off-by: Oscar Salvador <osalvador@suse.de> > Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") > --- > mm/page_owner.c | 30 +++++++++++++++++++++--------- > 1 file changed, 21 insertions(+), 9 deletions(-) > > @@ -292,7 +302,9 @@ noinline void __set_page_owner(struct page *page, unsigned short order, > return; > __set_page_owner_handle(page_ext, handle, order, gfp_mask); > page_ext_put(page_ext); > + set_current_in_page_owner(); > inc_stack_record_count(handle, gfp_mask); > + unset_current_in_page_owner(); This is because of the kmalloc() in add_stack_record_to_list() right? Why not wrap just that then? > } > > void __set_page_owner_migrate_reason(struct page *page, int reason)
On 2024/03/14 16:01, Oscar Salvador wrote: > On Thu, Mar 14, 2024 at 06:47:43AM +0100, Oscar Salvador wrote: >> On Thu, Mar 14, 2024 at 12:01:24PM +0900, Tetsuo Handa wrote: >>> Maybe culprit for a page owner refcount bug reported at >>> https://syzkaller.appspot.com/bug?id=8e4e66dfe299a2a00204ad220c641daaf1486a00 , for >>> that commit went to next-20240214 and syzbot started failing to test since next-20240215 ? >>> >>> Please send this patch to linux-next.git as soon as possible (or can someone experiencing >>> this bug try booting linux-next.git with this patch applied, so that we can check whether >>> syzbot can resume testing linux-next.git), and then send to linux.git together (so that >>> various trees which depend on linux.git won't start failing to boot). >> >> No, that is something else that I already started fixing a few days ago. >> I think I will have the fix ready today. > > I already have the fix. I will do some more testing and then I will send > it out. OK. Please test your patch using https://syzkaller.appspot.com/bug?extid=98c1a1753a0731df2dd4 .
diff --git a/mm/page_owner.c b/mm/page_owner.c index e96dd9092658..60663d657f7a 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -54,6 +54,22 @@ static depot_stack_handle_t early_handle; static void init_early_allocated_pages(void); +static inline void set_current_in_page_owner(void) +{ + /* + * Avoid recursion. + * + * We might need to allocate more memory from page_owner code, so make + * sure to signal it in order to avoid recursion. + */ + current->in_page_owner = 1; +} + +static inline void unset_current_in_page_owner(void) +{ + current->in_page_owner = 0; +} + static int __init early_page_owner_param(char *buf) { int ret = kstrtobool(buf, &page_owner_enabled); @@ -133,23 +149,16 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags) depot_stack_handle_t handle; unsigned int nr_entries; - /* - * Avoid recursion. - * - * Sometimes page metadata allocation tracking requires more - * memory to be allocated: - * - when new stack trace is saved to stack depot - */ if (current->in_page_owner) return dummy_handle; - current->in_page_owner = 1; + set_current_in_page_owner(); nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2); handle = stack_depot_save(entries, nr_entries, flags); if (!handle) handle = failure_handle; + unset_current_in_page_owner(); - current->in_page_owner = 0; return handle; } @@ -232,6 +241,7 @@ void __reset_page_owner(struct page *page, unsigned short order) alloc_handle = page_owner->handle; handle = save_stack(GFP_NOWAIT | __GFP_NOWARN); + for (i = 0; i < (1 << order); i++) { __clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags); page_owner->free_handle = handle; @@ -292,7 +302,9 @@ noinline void __set_page_owner(struct page *page, unsigned short order, return; __set_page_owner_handle(page_ext, handle, order, gfp_mask); page_ext_put(page_ext); + set_current_in_page_owner(); inc_stack_record_count(handle, gfp_mask); + unset_current_in_page_owner(); } void __set_page_owner_migrate_reason(struct page *page, int reason)
Prior to 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") the only place where page_owner could potentially go into recursion due to its need of allocating more memory was in save_stack(), which ends up calling into stackdepot code with the possibility of allocating memory. We made sure to guard against that by signaling that the current task was already in page_owner code, so in case a recursion attempt was made, we could catch that and return dummy_handle. After above commit, a new place in page_owner code was introduced where we could allocate memory, meaning we could go into recursion would we take that path. Make sure to signal that we are in page_owner in that codepath as well. Move the guard code into two helpers {un}set_current_in_page_owner() and use them prior to calling in the two functions that might allocate memory. Signed-off-by: Oscar Salvador <osalvador@suse.de> Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") --- mm/page_owner.c | 30 +++++++++++++++++++++--------- 1 file changed, 21 insertions(+), 9 deletions(-)