From patchwork Thu Sep 1 17:56:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 12963023 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BB42ECAAD3 for ; Thu, 1 Sep 2022 17:56:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95E6680038; Thu, 1 Sep 2022 13:56:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 90DB98000D; Thu, 1 Sep 2022 13:56:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D5F280038; Thu, 1 Sep 2022 13:56:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 712B28000D for ; Thu, 1 Sep 2022 13:56:58 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 49322AAA71 for ; Thu, 1 Sep 2022 17:56:58 +0000 (UTC) X-FDA: 79864272516.07.A0C9E6B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf31.hostedemail.com (Postfix) with ESMTP id DE9C720057 for ; Thu, 1 Sep 2022 17:56:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662055017; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=fnQitAajMTUQGbQt1ES06rzCtlDP/HCEkKiJfG4MWrk=; b=OrfhbfFxLvmm3Zj5KCVw+lJIBSnwPXxw/maQvgREDFtxHk60ovcbJ1E5rb3kLBbYnA2kZ+ fC6RVH1bI+YUG8Rnwtvzm/9XEr8CgBNulMEkzTjaaXX0amDm1W0kBSM7o80UNk5LcMPrS3 mz849fn92LuDAuXIXbwf636JYY59NV0= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-609-WnhkIXhmOfijTgqbEaLQwg-1; Thu, 01 Sep 2022 13:56:56 -0400 X-MC-Unique: WnhkIXhmOfijTgqbEaLQwg-1 Received: by mail-qt1-f198.google.com with SMTP id z6-20020ac875c6000000b0034454b14c91so14288427qtq.15 for ; Thu, 01 Sep 2022 10:56:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date; bh=fnQitAajMTUQGbQt1ES06rzCtlDP/HCEkKiJfG4MWrk=; b=CK3o/5FirjxQncHm84jmwzq7M6OFDmpiXytOW6knNlyl4DacpzrS4E1un5NONtZOpd kzyFRzP7FOm61okaEuLdRAUveo49IW9E5fixualHIh80Eqz0IX0yxJNHUjiVy/VCZh0T q3h9IvkfVjuUKL39GY6A/c0yVyyMbPNYCeb02mRTRRnjB5wtiMMDR+IxXZIDol8m5k8N qI7GTWMdGe73My9jRl/wmXMbM+LN8muN2kX7Ehj+aoq7bKBPFZasLxURrkfbmnTILK11 RinPNheSoDtGjNYBdUts27rH8BIhL09xzojLZU8jGi4Tw9VgsrSBUWymwXuwJgLLH+S2 rzsA== X-Gm-Message-State: ACgBeo0sHTrlXpW5ONnpkWSfC6Wfh72UqdFel7OuEblOBE9zZmseQGTY l7ysJVQc1oC9mHWRAkaJl3cfKUNNT6khdc9NxU7Egx/upsnIt8zItHeP9aVX09pXp4MYYWog0BJ +MNdSqzweI9maoBGODUmPqJTxP0ieGfrtQdPx0vhk9TRztRRmUdhUr8KMqkIvEQ== X-Received: by 2002:a05:620a:2681:b0:6b5:b613:cbe with SMTP id c1-20020a05620a268100b006b5b6130cbemr20152969qkp.341.1662055015440; Thu, 01 Sep 2022 10:56:55 -0700 (PDT) X-Google-Smtp-Source: AA6agR5ZK4bdxOL0hHQBK4EQfKfldNk+sY0PsAQW9fCHvdjjXABJPmb7frlq5drlh0i2a1Rv3fTw8g== X-Received: by 2002:a05:620a:2681:b0:6b5:b613:cbe with SMTP id c1-20020a05620a268100b006b5b6130cbemr20152951qkp.341.1662055015005; Thu, 01 Sep 2022 10:56:55 -0700 (PDT) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id bl8-20020a05620a1a8800b006bbdcb3fff7sm11687951qkb.69.2022.09.01.10.56.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Sep 2022 10:56:54 -0700 (PDT) Date: Thu, 1 Sep 2022 13:56:52 -0400 From: Brian Foster To: linux-mm@kvack.org Cc: Matthew Wilcox Subject: hugepage/swap: kernel BUG at mm/swap_state.c:154! Message-ID: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662055017; a=rsa-sha256; cv=none; b=5mGMqGR0sFwLNE/fd+jtSvEpIUR/O7BAltTH4f59+HgRye025Q6Sx5u8cbWWocA3p/RY5c x3C7hAKOIKtmi80eY3p0QEJhIQZNH/4MLdJm51d0KbuvKmUWsr+6iB2yWUeoBsAmSOH2cl SiT8OvpWk/1UhgjjL0JCdOS/CZtuF8g= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=OrfhbfFx; spf=pass (imf31.hostedemail.com: domain of bfoster@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662055017; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=fnQitAajMTUQGbQt1ES06rzCtlDP/HCEkKiJfG4MWrk=; b=iRjCgpdjIaMw54mFW8KyfcdZt0GQpN/c2U54S9hW8iNj1Wvun3PKHDKXYqMeu/o2xLUaeA qe5aQz2VQf9eU8h7Cebz6M7VUCGOOOkZPAEiB33TPltE+7fpPFG3wMoNFD4mE+s2FUkp8d AKKe7ny83qiGCLHoon2b/RJXqCaglOY= X-Stat-Signature: ja57u9yswf9u3aeb3pjtgohmap1cdzo5 X-Rspam-User: X-Rspamd-Queue-Id: DE9C720057 X-Rspamd-Server: rspam07 Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=OrfhbfFx; spf=pass (imf31.hostedemail.com: domain of bfoster@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1662055017-194412 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Willy, I'm seeing a softlockup issue in the madvise() pageout -> reclaim codepath that turned into the VM_BUG_ON() splat below[1] with debug enabled. The bug corresponds to the following code in __delete_from_swap_cache(): ... for (i = 0; i < nr; i++) { void *entry = xas_store(&xas, shadow); VM_BUG_ON_FOLIO(entry != folio, folio); set_page_private(folio_page(folio, i), 0); xas_next(&xas); } ... The immediate reason for failure is because the swap entry is zero, so the entry passed in from the caller (via folio->private) looks bogus. This page was originally added to swapcache as a 2MB hugepage, then is being split here and each subpage removed/freed via this split call. The splat occurs attempting to remove the first subpage. It looks like the reason the swapentry is lost is page->private being cleared a bit earlier in __split_huge_page_tail(). This was added via commit b653db77350c7 ("mm: Clear page->private when splitting or migrating a page"). I don't have context for the problem fixed by that patch, but (so far) the following tweak seems to address both issues I've seen (so I don't have detailed root cause of the soft lockup variant, but from testing it appears to be a side effect of this problem): Thoughts? If this makes sense I can send it as a proper patch.. Brian [1] bug splat: page:000000001c1895ba refcount:2 mapcount:0 mapping:00000000164a725a index:0x7f6441401 pfn:0x1d07a01 memcg:ff4ec5f22893a000 anon flags: 0x17ffffc008043d(locked|uptodate|dirty|lru|active|owner_priv_1|swapbacked|node=0|zone=2|lastcpupid=0x1fffff) raw: 0017ffffc008043d ffd64cba341e8008 ffd64cba341e8088 ff4ec5f20ea6e791 raw: 00000007f6441401 0000000000000000 00000002ffffffff ff4ec5f22893a000 page dumped because: VM_BUG_ON_FOLIO(entry != folio) ------------[ cut here ]------------ kernel BUG at mm/swap_state.c:154! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 34 PID: 12321 Comm: stress-ng Kdump: loaded Tainted: G E 6.0.0-rc3+ #4 Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.2.4 05/28/2021 RIP: 0010:__delete_from_swap_cache+0x21c/0x250 Code: 04 25 28 00 00 00 75 46 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 c7 c6 f8 ad 59 96 4c 89 f7 e8 14 e3 fb ff <0f> 0b 48 c7 c6 f0 34 59 96 4c 89 f7 e8 03 e3 fb ff 0f 0b 48 c7 c6 RSP: 0018:ff8dce04e9267878 EFLAGS: 00010046 RAX: 0000000000000034 RBX: 0000000000000000 RCX: 0000000000000027 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ff4ec5f0bfc5f860 RBP: 0000000000000001 R08: 0000000000000000 R09: 00000000ffff7fff R10: ff8dce04e9267708 R11: ffffffff96fe7368 R12: ff4ec5b2a1e88000 R13: 0000000000000001 R14: ffd64cba341e8040 R15: 0000000000000000 FS: 00007f660b421740(0000) GS:ff4ec5f0bfc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6498aa6000 CR3: 00000001764ba006 CR4: 0000000000771ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: delete_from_swap_cache+0x4c/0xc0 try_to_free_swap+0x115/0x160 free_swap_cache+0x7f/0xc0 free_page_and_swap_cache+0xf/0xd0 __split_huge_page+0x4b5/0x780 split_huge_page_to_list+0x6f9/0xa80 madvise_cold_or_pageout_pte_range+0x433/0xd90 ? sysvec_call_function_single+0x41/0x90 walk_pmd_range.isra.0+0xc3/0x320 walk_pud_range.isra.0+0x137/0x250 walk_p4d_range+0x10b/0x170 walk_pgd_range+0x11e/0x180 __walk_page_range+0x56/0x1a0 walk_page_range+0xaa/0x130 madvise_pageout+0xf6/0x170 ? rseq_get_rseq_cs.isra.0+0x16/0x220 madvise_vma_behavior+0x44d/0x6c0 ? find_vma+0x20/0x80 do_madvise.part.0+0x1a7/0x330 __x64_sys_madvise+0x5a/0x70 do_syscall_64+0x59/0x90 ? ktime_get+0x35/0xa0 ? clockevents_program_event+0x92/0x100 ? hrtimer_interrupt+0x126/0x210 ? sched_clock_cpu+0x9/0xb0 ? irqtime_account_irq+0x3c/0xb0 ? __irq_exit_rcu+0x46/0xe0 ? sysvec_apic_timer_interrupt+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f660b23eeeb Code: 73 01 c3 48 8b 0d 35 af 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 1c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 05 af 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffd3bfcdee8 EFLAGS: 00000202 ORIG_RAX: 000000000000001c RAX: ffffffffffffffda RBX: 00007ffd3bfce0d0 RCX: 00007f660b23eeeb RDX: 0000000000000015 RSI: 0000000257b3f000 RDI: 00007f63b1082000 RBP: 00007f63b1082000 R08: 0000000000000000 R09: 00000000000000cb R10: 0000000000000000 R11: 0000000000000202 R12: 0000000257b3f000 R13: 00007f6608bc1000 R14: 00007ffd3bfcdff0 R15: 0000000000000000 Modules linked in: rfkill(E) sunrpc(E) intel_rapl_msr(E) intel_rapl_common(E) intel_uncore_frequency(E) intel_uncore_frequency_common(E) i10nm_edac(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) ipmi_ssif(E) coretemp(E) kvm_intel(E) mgag200(E) i2c_algo_bit(E) drm_shmem_helper(E) mlx5_ib(E) kvm(E) dcdbas(E) irqbypass(E) drm_kms_helper(E) ib_uverbs(E) rapl(E) acpi_ipmi(E) intel_cstate(E) ipmi_si(E) syscopyarea(E) mei_me(E) dell_smbios(E) ib_core(E) sysfillrect(E) ipmi_devintf(E) intel_uncore(E) nd_pmem(E) wmi_bmof(E) pcspkr(E) dell_wmi_descriptor(E) sysimgblt(E) i2c_i801(E) isst_if_mbox_pci(E) isst_if_mmio(E) intel_vsec(E) isst_if_common(E) fb_sys_fops(E) i2c_smbus(E) mei(E) ipmi_msghandler(E) nd_btt(E) intel_pch_thermal(E) dax_pmem(E) acpi_power_meter(E) fuse(E) drm(E) xfs(E) libcrc32c(E) sd_mod(E) sg(E) lpfc(E) nvmet_fc(E) mlx5_core(E) nvmet(E) mlxfw(E) nvme_fc(E) nvme_fabrics(E) crct10dif_pclmul(E) crc32_pclmul(E) tls(E) crc32c_intel(E) nvme_core(E) ahci(E) t10_pi(E) psample(E) ghash_clmulni_intel(E) libahci(E) pci_hyperv_intf(E) megaraid_sas(E) scsi_transport_fc(E) bnxt_en(E) tg3(E) nfit(E) libata(E) wmi(E) libnvdimm(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e9414ee57c5b..c2ddbb81a743 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2445,7 +2445,8 @@ static void __split_huge_page_tail(struct page *head, int tail, page_tail); page_tail->mapping = head->mapping; page_tail->index = head->index + tail; - page_tail->private = 0; + if (!PageSwapCache(page_tail)) + page_tail->private = 0; /* Page flags must be visible before we make the page non-compound. */ smp_wmb();