From patchwork Wed Nov 6 19:53:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13865509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C66ED59F6C for ; Wed, 6 Nov 2024 19:54:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C2A66B00AA; Wed, 6 Nov 2024 14:54:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 84BB76B00AB; Wed, 6 Nov 2024 14:54:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6EC3D6B00AC; Wed, 6 Nov 2024 14:54:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 50FE96B00AA for ; Wed, 6 Nov 2024 14:54:18 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0696B161B7C for ; Wed, 6 Nov 2024 19:54:18 +0000 (UTC) X-FDA: 82756720452.25.3C7A7C9 Received: from out-188.mta1.migadu.com (out-188.mta1.migadu.com [95.215.58.188]) by imf18.hostedemail.com (Postfix) with ESMTP id F28651C0005 for ; Wed, 6 Nov 2024 19:54:00 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=FjzeKH8L; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.188 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730922732; a=rsa-sha256; cv=none; b=6hqM0sW/OULwwMxKVr7z6xod9LUkbWehNLgGACKCnsSvWaB6ZVVtexOKP2OUXDL/Y3Fc5J BSVxOK4oE0pTtbJWhT+CoyUuunxhMcupXm7p5R5mU2uiKzw+QPHrt2OICi3FkS3ULSXwfG dhuxWB6i3ZyPXqbMooe58XWwEmZHbiE= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=FjzeKH8L; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.188 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730922732; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Z0AJT81XUwusmlDp3ak4C1mJAolhBOfTLPwd7IOBvRo=; b=EdvPBxaYd1NbZem2SWaKdW64iJAJCGcVWlOgYcbOuUUkMq6ATI20BQ5EQbiN0HY7mmFNOT WyzRiPlWXcutvqr/cnh5kgsq3R5n8Qu3kB8hVXnGs9QIYZdv4epJTK8UBYsk/swPZY7sak 7EJuNH0+qGHMR9UIcWIDQ/U9n4mzHIs= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1730922852; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=Z0AJT81XUwusmlDp3ak4C1mJAolhBOfTLPwd7IOBvRo=; b=FjzeKH8LgarhOV5f64ku6FnTrsbb/weExEmp7D6ICjexrXjF22ZhGPKL1Ngj+vhBEfI0qS LBDRLv5+jGn26Bsc1HWTAkWxtylclKzU30TIqGrMbqi8g7eYMaUHM/RTHmGHNz0SGuR5k8 Sn/+9Wmqq9V1pgBThNACrbklkFpoYBY= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , linux-mm@kvack.org, Roman Gushchin , syzbot+e985d3026c4fd041578e@syzkaller.appspotmail.com, Hugh Dickins , stable@vger.kernel.org, Matthew Wilcox , Sean Christopherson , Vlastimil Babka Subject: [PATCH v3] mm: page_alloc: move mlocked flag clearance into free_pages_prepare() Date: Wed, 6 Nov 2024 19:53:54 +0000 Message-ID: <20241106195354.270757-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: F28651C0005 X-Stat-Signature: 1pnd3yga7gj6sfk6r8193twz7brg69tn X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1730922840-183028 X-HE-Meta: U2FsdGVkX1+Oa21ZmX0TBbIGG2B5joc2x6kGzM1i0/Z29fusIxlCKgfgMiQHx7hvAv6XECQnQAs9ox+Kinn7RUxu3d+erNWFCOUO7m7vpUO9y6B3kwfi4XErQLqxxWUpeUP1UoygI1gYDSrTnGJ8S4TEzg03IG8IyqBNmp1yG6ptgq71bes2ELzeuvNSo5XiOW+zX8ReVDS6ysr/V9bQOdhx21j+L3MwC7JjTb+FHv1lpfHfrw/KtMn4R5CGf0GqVWH0V3KIIQvTs4UL52hRuC6aFhm4DWfC2Y4ImlIU12hdaTjT/FiEKS7wxO3/L7t5CRk2NFewA+VFkc/tPTo293RkMAM9ZGur6w8zPPrMRBBtMuSZOj30P1k2wSoxLu46+6Nn8eefiIT3UI39OUYsM4MiT5bHezUH+qNKKCY1PD/2OTOkQzAZ7yrZcJZ7+Hu/ls3WR2ZE4LSaVxnlWQgU3BucxrPESMYTBsDnnajI22YrhZ4o9G4H3JWIkhQw9qhPM7dFNepXAToXmC0RLxEkkjkb9wGUHdCYtucIXH//p19ZTwZeyU6Gwf3UmDzOVNBEQZkWzG+PRp9n/HF4H3rsH1HKPsaRgD2Cv4f8BUIM9yFTqlgsAcNbNS+u8UWNQuM47DcWZ9v/5wtToAScpB7201TUCKrTVgZfBUqTnlBGlWKsudHfL5V1R9ElmCyauie2bz8c3I5/q/7937si7X8U39OXbZIac/2BFxtywZqE5vE9rI4dPVYnf5952i7Dp27l2RYm0H25MeshxEQHNXdsw8HscFoS7t0g9ubFiwDppER9wy04vqhzM0tfTgZX7clrTi0JeHfI2tLxbWfcO+wWda8Khgj/Rf+kVYL5sbAacmFjZvzy4EC8yXOSiqHuJboSOrQi+DUA1jdFunUBnW+yyxNcIAZvIZdnLssMe+CPz+o+e0KQc9FYBNEzOfGjBlCx+e115OeOuRqTaV2rV4d HyxZ7Mna ViTf1IesXTpv62NCVDpLpjtzThiOYP5m3vvZobhLmowi0BjQUSgW8VKxSB2E4PXiPMrex3qvBNBReDMNOuw4u87FQQ6BfceaMHeHviUAzudf+3P46Xui9QGYrn+Tx5L8QlCov2isN+kkHlZujhmKVg45V0p1n2yfn+afb/b+jMK6EohhkG459BnpFVnPGn7QMaOqWYvS51k5eI4KZ4xz98b5ujbLZCy+Io0mTDT5aEMuJfLyQ+5VzQNfURNNsKsle1671Fuwv0aVWSvAva4dKN3yOikpMUAMGxf47YpdsGH0uGmIwOHi408rUnQstoCofPbwwXTEJOSEnwNBrkmi8fN1HamNc7TQiP/K8EQzjijzqs1JSn+erW973RfQCqFkS61lZCrNgW3AJqio+V0LFoB/zvLQ53W6s6VL6bJoef7pgV9i4M4earssRVoO9OVSJRXK9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Syzbot reported a bad page state problem caused by a page being freed using free_page() still having a mlocked flag at free_pages_prepare() stage: BUG: Bad page state in process syz.5.504 pfn:61f45 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x61f45 flags: 0xfff00000080204(referenced|workingset|mlocked|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000080204 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 8443, tgid 8442 (syz.5.504), ts 201884660643, free_ts 201499827394 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x303f/0x3190 mm/page_alloc.c:3457 __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4733 alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265 kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99 kvm_create_vm virt/kvm/kvm_main.c:1235 [inline] kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5488 [inline] kvm_dev_ioctl+0x12dc/0x2240 virt/kvm/kvm_main.c:5530 __do_compat_sys_ioctl fs/ioctl.c:1007 [inline] __se_compat_sys_ioctl+0x510/0xc90 fs/ioctl.c:950 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386 do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e page last free pid 8399 tgid 8399 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1108 [inline] free_unref_folios+0xf12/0x18d0 mm/page_alloc.c:2686 folios_put_refs+0x76c/0x860 mm/swap.c:1007 free_pages_and_swap_cache+0x5c8/0x690 mm/swap_state.c:335 __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline] tlb_batch_pages_flush mm/mmu_gather.c:149 [inline] tlb_flush_mmu_free mm/mmu_gather.c:366 [inline] tlb_flush_mmu+0x3a3/0x680 mm/mmu_gather.c:373 tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:465 exit_mmap+0x496/0xc40 mm/mmap.c:1926 __mmput+0x115/0x390 kernel/fork.c:1348 exit_mm+0x220/0x310 kernel/exit.c:571 do_exit+0x9b2/0x28e0 kernel/exit.c:926 do_group_exit+0x207/0x2c0 kernel/exit.c:1088 __do_sys_exit_group kernel/exit.c:1099 [inline] __se_sys_exit_group kernel/exit.c:1097 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1097 x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 0 UID: 0 PID: 8442 Comm: syz.5.504 Not tainted 6.12.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Call Trace: __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 bad_page+0x176/0x1d0 mm/page_alloc.c:501 free_page_is_bad mm/page_alloc.c:918 [inline] free_pages_prepare mm/page_alloc.c:1100 [inline] free_unref_page+0xed0/0xf20 mm/page_alloc.c:2638 kvm_destroy_vm virt/kvm/kvm_main.c:1327 [inline] kvm_put_kvm+0xc75/0x1350 virt/kvm/kvm_main.c:1386 kvm_vcpu_release+0x54/0x60 virt/kvm/kvm_main.c:4143 __fput+0x23f/0x880 fs/file_table.c:431 task_work_run+0x24f/0x310 kernel/task_work.c:239 exit_task_work include/linux/task_work.h:43 [inline] do_exit+0xa2f/0x28e0 kernel/exit.c:939 do_group_exit+0x207/0x2c0 kernel/exit.c:1088 __do_sys_exit_group kernel/exit.c:1099 [inline] __se_sys_exit_group kernel/exit.c:1097 [inline] __ia32_sys_exit_group+0x3f/0x40 kernel/exit.c:1097 ia32_sys_call+0x2624/0x2630 arch/x86/include/generated/asm/syscalls_32.h:253 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386 do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e RIP: 0023:0xf745d579 Code: Unable to access opcode bytes at 0xf745d54f. RSP: 002b:00000000f75afd6c EFLAGS: 00000206 ORIG_RAX: 00000000000000fc RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000000ffffff9c RDI: 00000000f744cff4 RBP: 00000000f717ae61 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 The problem was originally introduced by commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance"): it was handling focused on handling pagecache and anonymous memory and wasn't suitable for lower level get_page()/free_page() API's used for example by KVM, as with this reproducer. Fix it by moving the mlocked flag clearance down to free_page_prepare(). The bug itself if fairly old and harmless (aside from generating these warnings), aside from a small memory leak - "bad" pages are stopped from being allocated again. Reported-by: syzbot+e985d3026c4fd041578e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6729f475.050a0220.701a.0019.GAE@google.com Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance") Signed-off-by: Roman Gushchin Acked-by: Hugh Dickins Cc: Cc: Hugh Dickins Cc: Matthew Wilcox Cc: Sean Christopherson Cc: Vlastimil Babka --- mm/page_alloc.c | 15 +++++++++++++++ mm/swap.c | 14 -------------- 2 files changed, 15 insertions(+), 14 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47048b39b8ca..371d1c6c1fc7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1048,6 +1048,7 @@ __always_inline bool free_pages_prepare(struct page *page, bool skip_kasan_poison = should_skip_kasan_poison(page); bool init = want_init_on_free(); bool compound = PageCompound(page); + struct folio *folio = page_folio(page); VM_BUG_ON_PAGE(PageTail(page), page); @@ -1057,6 +1058,20 @@ __always_inline bool free_pages_prepare(struct page *page, if (memcg_kmem_online() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order); + /* + * In rare cases, when truncation or holepunching raced with + * munlock after VM_LOCKED was cleared, Mlocked may still be + * found set here. This does not indicate a problem, unless + * "unevictable_pgs_cleared" appears worryingly large. + */ + if (unlikely(folio_test_mlocked(folio))) { + long nr_pages = folio_nr_pages(folio); + + __folio_clear_mlocked(folio); + zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); + } + if (unlikely(PageHWPoison(page)) && !order) { /* Do not let hwpoison pages hit pcplists/buddy */ reset_page_owner(page, order); diff --git a/mm/swap.c b/mm/swap.c index 638a3f001676..10decd9dffa1 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -78,20 +78,6 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp, lruvec_del_folio(*lruvecp, folio); __folio_clear_lru_flags(folio); } - - /* - * In rare cases, when truncation or holepunching raced with - * munlock after VM_LOCKED was cleared, Mlocked may still be - * found set here. This does not indicate a problem, unless - * "unevictable_pgs_cleared" appears worryingly large. - */ - if (unlikely(folio_test_mlocked(folio))) { - long nr_pages = folio_nr_pages(folio); - - __folio_clear_mlocked(folio); - zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); - count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); - } } /*