From patchwork Thu Dec 19 11:52:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miaohe Lin X-Patchwork-Id: 13914998 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50392E77184 for ; Thu, 19 Dec 2024 11:55:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 907506B0082; Thu, 19 Dec 2024 06:55:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B6BB6B0083; Thu, 19 Dec 2024 06:55:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A5096B0085; Thu, 19 Dec 2024 06:55:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5DEAC6B0082 for ; Thu, 19 Dec 2024 06:55:02 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2C2EFB0CCA for ; Thu, 19 Dec 2024 11:54:50 +0000 (UTC) X-FDA: 82911550890.10.D83A2EE Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf10.hostedemail.com (Postfix) with ESMTP id D85BCC0016 for ; Thu, 19 Dec 2024 11:54:32 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf10.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734609273; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=yLwlxVbHC2rUwrmn7yNv+2ZPowzSOpws/xKay+0rUxs=; b=Aa0KwJetfy5ESCuzRLvynmz1/+JZaIyfm0ZJXilwJhjwUVhydrc0eE8FB+wsYIsm0HqW9p zXCkrHXOYDaRhIFvCuMrDJz8KA9Xl3Kh7lA9o8bpOPJuT0y953jIv+bMlzmjRJ0U8+VgxS quo4YdtfqFU2JKkl2vG9yD4VK4KM11c= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734609273; a=rsa-sha256; cv=none; b=nfRew02S5BIFwalpbPw/HgI/e5xswrXZ4vuEXx8O5XHGoi7sSa7fRE9V33T2gU4P4yzeub LksnhhLac7EoK6R17GEYIC+TsyTs7vLIKnau1zWRnxzZyODFWo2QDDM+aGCTEsNJlA0vvh fb6HT1M9p1qwrYwkmjcjB4XrU0dMjyQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf10.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com Received: from mail.maildlp.com (unknown [172.19.88.194]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4YDTSb3cLDzRjgw; Thu, 19 Dec 2024 19:52:47 +0800 (CST) Received: from kwepemd200019.china.huawei.com (unknown [7.221.188.193]) by mail.maildlp.com (Postfix) with ESMTPS id 46CB41401F2; Thu, 19 Dec 2024 19:54:41 +0800 (CST) Received: from huawei.com (10.173.127.72) by kwepemd200019.china.huawei.com (7.221.188.193) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Thu, 19 Dec 2024 19:54:40 +0800 From: Miaohe Lin To: CC: , , , , Subject: [PATCH v2] mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory Date: Thu, 19 Dec 2024 19:52:09 +0800 Message-ID: <20241219115209.574065-1-linmiaohe@huawei.com> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 X-Originating-IP: [10.173.127.72] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemd200019.china.huawei.com (7.221.188.193) X-Stat-Signature: 173dj7po39auzyrf36z7jdqgyeyzjwu5 X-Rspamd-Queue-Id: D85BCC0016 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1734609272-86923 X-HE-Meta: U2FsdGVkX1+oQcT+bjwKbncS2FCTio/HUM7ot2z4vBixA9mr/giwGqkr6Snkds2HuEpyDRLKeOuXHD5B121RKJ2tykhAAoB7JYwJ7LeORaUh3xQkscLoR4qv9vYgroD2vg9wbdlErm5m71ldUCjY3dLZK64Kee7XXhtd7jgx7hjojvHA/3t6kVsRoRz3+3ZcFBfmXD/GBWSTdHjMgi0CCvtddsZOyTlar15Z2bpnz+znBDyNAb1RkdvOYWvCesuqra64uywKtjLsxbuygdxw30wCY3Q4sWdo/Gneg+vHtyGjM2I3B55Mcrzpl3CbfbJbY7p2Na3ntBv8et1we8MfhXBfK1AP8/tQrr/mRIWWPI2qDUdYiqY/ieFawZzixoQ2TpsYKbbVhCN5u6D9tBuXT/z8kcOrnVWJXZZC4a4Izhs7WSg2k2gWwYLALrLRJYOWp+cX5a9jaPsb9t3jnyCM9ySoQWfTQe4UJ+cwJKg6+ELTHznpjsF2UkWm8G0kHA6xKKJHKdFSFHPQSRhpqfjFWj1Cxh4Y+9OhBRUOr0fI/Yo9YJ5dserLiKoZxjR75bJ/nsAaY4dvdRaHMH54UzAWbD8U+cnThRnIdi4ZbvFgUS9RM/+cThEDaKm0vd8538N3GsBzIgSA1M7d9epGwHg/1PM5mZfPc/yn74eh76T2kydYfagd47aD7/D8ztymW1yqB7ON415yghz5QdpMWBnIAh0NfiET302wgFAZRDHGHo5NHchE9/ugf1jMUR459cNl/6oj+qej90ryNzmgtmAR9sANeWamUtjmYDGDR9KLQLlVridYPqcdiKLibWkvXERO5WJnDFFZj7JPCX1SjjPdVg/96PM9N/RuWHaV0t8M7CEEZpy7jLWS6UmsqPWZC1TkYgBfLL0Rnk14ibkhIIyvph4AAVNKdfSFyWJowp/fxn5EI7J7IlUYnQGjy0bxT73aOM4OLDllZrtq4Lfh+ku JMtzwIf7 gaKuth+oiwlXunWh3BoxUfJNuW4Y7Tl7y5kX4wrwF2b3+7Mon5bzNaZ3D56tqyQTmHg05H5JGVh1DEQ5LNc1PJdgm5veY7D9TXJ56hIjjfTnXob0V43M30LbY/9Tup4BDf3C9l/25cK4aIXWtPjgAMMW40tv0X2iV4pXXXJPfAC27Ejk2mwJNovu1zb1f2CSAEHBnpMGwOIExo0EixqeZ6wgdgMkKEKE13RQ03wsoFeEO6NhN5vxqGRfd2zQMaRE4Z19/li2F61BVt8M= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When I did memory failure tests recently, below panic occurs: page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page)) kernel BUG at include/linux/page-flags.h:616! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40 RIP: 0010:unpoison_memory+0x2f3/0x590 RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246 RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0 RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000 R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe FS: 00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0 Call Trace: unpoison_memory+0x2f3/0x590 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110 debugfs_attr_write+0x42/0x60 full_proxy_write+0x5b/0x80 vfs_write+0xd5/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xb9/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f08f0314887 RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887 RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001 RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009 R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00 Modules linked in: hwpoison_inject ---[ end trace 0000000000000000 ]--- RIP: 0010:unpoison_memory+0x2f3/0x590 RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246 RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0 RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000 R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe FS: 00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]--- The root cause is that unpoison_memory() tries to check the PG_HWPoison flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is triggered. This can be reproduced by below steps: 1.Offline memory block: echo offline > /sys/devices/system/memory/memory12/state 2.Get offlined memory pfn: page-types -b n -rlN 3.Write pfn to unpoison-pfn echo > /sys/kernel/debug/hwpoison/unpoison-pfn Signed-off-by: Miaohe Lin --- v2: Use pfn_to_online_page per David. Thanks. --- mm/memory-failure.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a7b8ccd29b6f..02be0596ce67 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2556,10 +2556,18 @@ int unpoison_memory(unsigned long pfn) static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); - if (!pfn_valid(pfn)) - return -ENXIO; + p = pfn_to_online_page(pfn); + if (!p) { + struct dev_pagemap *pgmap; - p = pfn_to_page(pfn); + if (!pfn_valid(pfn)) + return -ENXIO; + pgmap = get_dev_pagemap(pfn, NULL); + if (!pgmap) + return -ENXIO; + put_dev_pagemap(pgmap); + p = pfn_to_page(pfn); + } folio = page_folio(p); mutex_lock(&mf_mutex);