From patchwork Thu Apr 10 09:10:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 14046171 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F71FC369A4 for ; Thu, 10 Apr 2025 09:10:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 32B732800E4; Thu, 10 Apr 2025 05:10:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B14E2800E3; Thu, 10 Apr 2025 05:10:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12A882800E4; Thu, 10 Apr 2025 05:10:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D7ACC2800E3 for ; Thu, 10 Apr 2025 05:10:28 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 98F8B58C3B for ; Thu, 10 Apr 2025 09:10:28 +0000 (UTC) X-FDA: 83317563336.23.6CEC9C5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 9908940003 for ; Thu, 10 Apr 2025 09:10:26 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ryy1MlFJ; spf=pass (imf07.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744276226; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=27eNNzuEQxbmkvmFUCfr+IytzedlUHGjPUYmckfSXUM=; b=HuXzyfUMUeTTF55Pf10NRIg2b70G44ZlCLfIe/1A6VQTyOR/O3bRxMMw9MypluSHudGNYQ TQ3Ui0k5eS03E4rMXG8E2P+XR2mb1s4N1kuNgZ0K4zdXeF3hCikrKrzjgmZRocl+3hjPUY 6Ny+mw2Tgr2I5c/fOdxT/Aznw9vsijY= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ryy1MlFJ; spf=pass (imf07.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744276226; a=rsa-sha256; cv=none; b=TOZQO01Fo4snOmcy/8nW0fYoHUDV3RuIMsafan9RSrquHcbZ6cHSYWyBHM9lE5Y54tvpwp nEp6Gjr/t24Y60sD1khaaXk8PE+EwMKcR+ax7hjacz50kdIQMUM9qB3ymAsw9DTGlj8ulB 7KhKVY6YUKnatVXk5nZ1tXo8uwFXJSU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1744276225; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=27eNNzuEQxbmkvmFUCfr+IytzedlUHGjPUYmckfSXUM=; b=Ryy1MlFJmKdcQBlcLDlNr2MwisWraGq1+QKAU3cTy28cUrF8GhxIs3DBEnqhI6P36Bqv+Q xpxbmUFA/Re6+Xu2kgRjOS1wHBb6gbHnnE1hItftUcFYMfxqmZ6LVfXI69ob4x4rONg6hz 14Qrr5W3fsxbnhjvSuEcbV9WAsizExc= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-489-VqTFn5uHPl2r_2vGO6lFtg-1; Thu, 10 Apr 2025 05:10:24 -0400 X-MC-Unique: VqTFn5uHPl2r_2vGO6lFtg-1 X-Mimecast-MFC-AGG-ID: VqTFn5uHPl2r_2vGO6lFtg_1744276224 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-43d733063cdso5379195e9.0 for ; Thu, 10 Apr 2025 02:10:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744276223; x=1744881023; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=27eNNzuEQxbmkvmFUCfr+IytzedlUHGjPUYmckfSXUM=; b=FMaQbvHp3mUEnbEEc+90FCVAgdj0+D/fTzzdGCqBeJ/metBUVNHGKVBsNTsQ6vDmv2 7KC9ysETLq6HXxgHl/xn8PLtN7RaeRzKkMO/tofpAcYUo52j8gNCoEK2hEhbmu6kavDR c/34WnL4mnGYR+OIFhUyhaOqBpvpzJW/mR8ex5ctQT+j3UVHZYMPAogAEXFB6JDWTjnZ e5bpMk/OUfnshBmLqNDOtybnAPKFWwJIZlE3Mv2LSnfLmTBYSbwU/U8W12w9UoyyZkjr COrv7ni17wmeNdH5Oc66V9b1kgq14Q5CDpzdWeur1YerCxq87devvejBKC/5ytQLB04m gqlA== X-Gm-Message-State: AOJu0YwWnEj5QzTngRUMF8Zk65eJ2nOV616I+6KGlKcuWVN1tlIJ4UZR c1CI3pijfoeD/ufmu9CxeRbNpCbjl59QUJLYWqzvYV52Ew+Nev3/YPgl24ig+er8Ddx4uc9O96P ssxeLTsZ0cbo7G7pSngxxOm4lSHQCeHRXfMYai0Nuusn02KGX X-Gm-Gg: ASbGncuozd3wX5U8tl5il1rU3uPHqwACm1eySi31FbA6wyW4f6pQEw7mGG0Qq+6G3x0 iXxzdzwkAbj/VPbR51Hkp5+GaOSt5AXoiz92mC/PV11yDvKp4EEv/02vD9v1hNBzd13HtAFjdgD P2qW8vH7zU8Pk5Q4L9Hi9KgcqQUEgX60tE9I88L3sfD0VIIzJR6lPoTtxR65OevNb4Xs+a6LwlV cnsemmDblvzJK1GWJuXvwfua7zOK+AuZj/DrOXXiVEF3tNaYxvOadWjroUwb9xOMZjykuAWoFfm bvBV98JMdok1xzzNEMsfIS+Mzd4ObU6pqhziryRLtF1t2+6YPkyd3XWqpPZsEua2aRDZsUry X-Received: by 2002:a5d:59ad:0:b0:39c:13fa:3eb with SMTP id ffacd0b85a97d-39d8f4e43c0mr1524425f8f.39.1744276223390; Thu, 10 Apr 2025 02:10:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF9o//1YdnfQaWfNqnIWzI/WV2dN54LX94NnfwgJ3jSA+I3+tbjCeuQrP5snslAD47XW4PEGg== X-Received: by 2002:a5d:59ad:0:b0:39c:13fa:3eb with SMTP id ffacd0b85a97d-39d8f4e43c0mr1524393f8f.39.1744276222986; Thu, 10 Apr 2025 02:10:22 -0700 (PDT) Received: from localhost (p200300cbc71a5900d1064706528a7cd5.dip0.t-ipconnect.de. [2003:cb:c71a:5900:d106:4706:528a:7cd5]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-39d893773a0sm4135334f8f.25.2025.04.10.02.10.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 10 Apr 2025 02:10:22 -0700 (PDT) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev, David Hildenbrand , Alison Schofield , Alexander Viro , Christian Brauner , Jan Kara , Dan Williams , Matthew Wilcox , Andrew Morton , Alistair Popple , Christoph Hellwig Subject: [PATCH v1] fs/dax: fix folio splitting issue by resetting old folio order + _nr_pages Date: Thu, 10 Apr 2025 11:10:20 +0200 Message-ID: <20250410091020.119116-1-david@redhat.com> X-Mailer: git-send-email 2.48.1 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: hBwhmVNYRmz9n2jUmZYxOoVbqnGQxT20KvWx_Ogfah0_1744276224 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Queue-Id: 9908940003 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 436aose9rsy5h7pmi3obp8xjjmrwoqeo X-HE-Tag: 1744276226-413071 X-HE-Meta: U2FsdGVkX1/mDfKldvf7xs4hrEJzPrEQpflrkxBR2JDDZ3c1Az6PmznKGkIfnp/bCjKThzfLhP2YCnJEczGIrJB+il/ZvSjiklik7au321iPZwV/TzzLwDqUWMr/SeihjVsS3Clkn1f5IVZyOS3CjpguFwff2ohTP0M29wrPlo5DTunOejeoturUfoQcf+wXw+ueuPbY7FlLxtxmexhq6g0DeYXo/tSXLJlTX4wJ9HdkvMUrUo1cdfWfQEw5JAhDtwm6k3HziU8nRqC6lu99TC39LXlAZbd9U020GzCdPTobj/gAYtJMdXmyC1BD2UT2ZxszqZ2q06DALzN0r/SvXiCybKLQpZpRH7WZ5PJuE50T1t0xT0Xc2o+NlDp4P1cD03p8f4OStvGzdQtFchgknxlmuILqrNNMIVo17yIcxQslcybZwwuOehxPd2dv8vTnzqnls1iqH5zR6o1U9r41cJtDW7e6YT6OOpNdRpBZ3ccL8JBZiCnEC9k55yoCxJMK7UPcqHbMV1EEQR0ywVqQKROEBbxLrsOKvgMSuJyXIqNDdqxVHacObnkUEhUkttHc/tpDdWL8AlljP7Otc4XuxWAJo7Sg2mvKF0QS9YSl0gQTVS74k3MuFq+5sJoHs++A0U3Qe8Xfbb7xM6Di4XIf/1r6W5ZSzokICYoQkD02mHXfNsvq8pvXFEpKjgz5UyQ57XNjjT2oKlD+HFwLJY102f5O4Mfyb+40E66w9jAABAVVGRRP9GspIWMY+0M9wM/TPbFvaQFwTA/J9lVTg7wFICq7uvDOkDht7r6GQS4ROClNOc8LmyoOTFfHW/x+KN4WphEi4hpODBgY/GZksE5yMi8aG8bJJ0DSmvAUtwEFwGXx7LaJv+QHi16rtCHeeZuzzRcB+TEqF2SckC9/mTwu2v1wEXfBDO+VzDrhwgDqwBi8FNOy4RXEdWixpq/o1RC7xF+ya7Wf08xUdM8zPNx Ym6P4TQM gUUf7V2ENEPtg3sGBKgmj5D0F+DKALfVmQor6BCge/j6RHVPMepL1IT/K4IrxnpwaR08q0j8+8nLSsAA7H8+SUB8/+zi1pmctqiMvrImJh+baX9o4V2vG1bp8nVIl2cuNdRLylUQkSs+jcqa/JfC8WpgxX6M2dQFL0VfxNGWT9dBKWCB4oiO7kc49HHjvl5vZbwk9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Alison reports an issue with fsdax when large extends end up using large ZONE_DEVICE folios: [ 417.796271] BUG: kernel NULL pointer dereference, address: 0000000000000b00 [ 417.796982] #PF: supervisor read access in kernel mode [ 417.797540] #PF: error_code(0x0000) - not-present page [ 417.798123] PGD 2a5c5067 P4D 2a5c5067 PUD 2a5c6067 PMD 0 [ 417.798690] Oops: Oops: 0000 [#1] SMP NOPTI [ 417.799178] CPU: 5 UID: 0 PID: 1515 Comm: mmap Tainted: ... [ 417.800150] Tainted: [O]=OOT_MODULE [ 417.800583] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 417.801358] RIP: 0010:__lruvec_stat_mod_folio+0x7e/0x250 [ 417.801948] Code: ... [ 417.803662] RSP: 0000:ffffc90002be3a08 EFLAGS: 00010206 [ 417.804234] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000002 [ 417.804984] RDX: ffffffff815652d7 RSI: 0000000000000000 RDI: ffffffff82a2beae [ 417.805689] RBP: ffffc90002be3a28 R08: 0000000000000000 R09: 0000000000000000 [ 417.806384] R10: ffffea0007000040 R11: ffff888376ffe000 R12: 0000000000000001 [ 417.807099] R13: 0000000000000012 R14: ffff88807fe4ab40 R15: ffff888029210580 [ 417.807801] FS: 00007f339fa7a740(0000) GS:ffff8881fa9b9000(0000) knlGS:0000000000000000 [ 417.808570] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 417.809193] CR2: 0000000000000b00 CR3: 000000002a4f0004 CR4: 0000000000370ef0 [ 417.809925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 417.810622] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 417.811353] Call Trace: [ 417.811709] [ 417.812038] folio_add_file_rmap_ptes+0x143/0x230 [ 417.812566] insert_page_into_pte_locked+0x1ee/0x3c0 [ 417.813132] insert_page+0x78/0xf0 [ 417.813558] vmf_insert_page_mkwrite+0x55/0xa0 [ 417.814088] dax_fault_iter+0x484/0x7b0 [ 417.814542] dax_iomap_pte_fault+0x1ca/0x620 [ 417.815055] dax_iomap_fault+0x39/0x40 [ 417.815499] __xfs_write_fault+0x139/0x380 [ 417.815995] ? __handle_mm_fault+0x5e5/0x1a60 [ 417.816483] xfs_write_fault+0x41/0x50 [ 417.816966] xfs_filemap_fault+0x3b/0xe0 [ 417.817424] __do_fault+0x31/0x180 [ 417.817859] __handle_mm_fault+0xee1/0x1a60 [ 417.818325] ? debug_smp_processor_id+0x17/0x20 [ 417.818844] handle_mm_fault+0xe1/0x2b0 [...] The issue is that when we split a large ZONE_DEVICE folio to order-0 ones, we don't reset the order/_nr_pages. As folio->_nr_pages overlays page[1]->memcg_data, once page[1] is a folio, it suddenly looks like it has folio->memcg_data set. And we never manually initialize folio->memcg_data in fsdax code, because we never expect it to be set at all. When __lruvec_stat_mod_folio() then stumbles over such a folio, it tries to use folio->memcg_data (because it's non-NULL) but it does not actually point at a memcg, resulting in the problem. Alison also observed that these folios sometimes have "locked" set, which is rather concerning (folios locked from the beginning ...). The reason is that the order for large folios is stored in page[1]->flags, which become the folio->flags of a new small folio. Let's fix it by adding a folio helper to clear order/_nr_pages for splitting purposes. Maybe we should reinitialize other large folio flags / folio members as well when splitting, because they might similarly cause harm once page[1] becomes a folio? At least other flags in PAGE_FLAGS_SECOND should not be set for fsdax, so at least page[1]->flags might be as expected with this fix. From a quick glimpse, initializing ->mapping, ->pgmap and ->share should re-initialize most things from a previous page[1] used by large folios that fsdax cares about. For example folio->private might not get reinitialized, but maybe that's not relevant -- no traces of it's use in fsdax code. Needs a closer look. Another thing that should be considered in the future is performing similar checks as we perform in free_tail_page_prepare() -- checking pincount etc. -- when freeing a large fsdax folio. Fixes: 4996fc547f5b ("mm: let _folio_nr_pages overlay memcg_data in first tail page") Fixes: 38607c62b34b ("fs/dax: properly refcount fs dax pages") Reported-by: Alison Schofield Closes: https://lkml.kernel.org/r/Z_W9Oeg-D9FhImf3@aschofie-mobl2.lan Cc: Alexander Viro Cc: Christian Brauner Cc: Jan Kara Cc: Dan Williams Cc: Matthew Wilcox Cc: Andrew Morton Cc: Alistair Popple Cc: Christoph Hellwig Signed-off-by: David Hildenbrand Tested-by: Alison Schofield --- fs/dax.c | 1 + include/linux/mm.h | 17 +++++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/fs/dax.c b/fs/dax.c index af5045b0f476e..676303419e9e8 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -396,6 +396,7 @@ static inline unsigned long dax_folio_put(struct folio *folio) order = folio_order(folio); if (!order) return 0; + folio_reset_order(folio); for (i = 0; i < (1UL << order); i++) { struct dev_pagemap *pgmap = page_pgmap(&folio->page); diff --git a/include/linux/mm.h b/include/linux/mm.h index b7f13f087954b..bf55206935c46 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1218,6 +1218,23 @@ static inline unsigned int folio_order(const struct folio *folio) return folio_large_order(folio); } +/** + * folio_reset_order - Reset the folio order and derived _nr_pages + * @folio: The folio. + * + * Reset the order and derived _nr_pages to 0. Must only be used in the + * process of splitting large folios. + */ +static inline void folio_reset_order(struct folio *folio) +{ + if (WARN_ON_ONCE(!folio_test_large(folio))) + return; + folio->_flags_1 &= ~0xffUL; +#ifdef NR_PAGES_IN_LARGE_FOLIO + folio->_nr_pages = 0; +#endif +} + #include /*