From patchwork Fri May 7 21:21:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12245191 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D9A5C433ED for ; Fri, 7 May 2021 21:21:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A83A2613ED for ; Fri, 7 May 2021 21:21:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A83A2613ED Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1E0FC8D001F; Fri, 7 May 2021 17:21:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 191A18D001A; Fri, 7 May 2021 17:21:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 031548D001F; Fri, 7 May 2021 17:21:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0040.hostedemail.com [216.40.44.40]) by kanga.kvack.org (Postfix) with ESMTP id D11E38D001A for ; Fri, 7 May 2021 17:21:56 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 93BBC8249980 for ; Fri, 7 May 2021 21:21:56 +0000 (UTC) X-FDA: 78115707432.10.E0A9EAD Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) by imf26.hostedemail.com (Postfix) with ESMTP id C960E407F8E8 for ; Fri, 7 May 2021 21:21:43 +0000 (UTC) Received: by mail-pg1-f181.google.com with SMTP id d29so8287877pgd.4 for ; Fri, 07 May 2021 14:21:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=vGpl1ZIkYLQSWJHQnKvQZR1S3qsyIwFWZU1hvVbQEQI=; b=lvijPPd9gbu7jzYPF/R8K31/3W1JT4P2RhNaz4J+gvbdi7TGAQbUFf93Qd/EtwWagT 9I7e/mna75VHLYhVXsV4O2fufdOlJLZTCX+yjkNBRB29e51CTanpRvoJDQWqU0vq7M6m b+Vz7bqsCeU5gwq9rWnCraeVprwlM4CijfRzty0dYJHM0mZ1m3tElEgxm34pxLR4XhBC ZV0hz81BZRPY94BW819WOY3naoqaE7rUWCXe+tDXXOd6HZVzl1NCUcfpDvGB11m9/n3u e8Fz0G+ewtzI/o8sfQzOx8zyWQVSMIUcZmgPHnXXNVg06XXAydAk4rgQPO3AkaQQL4ed byxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=vGpl1ZIkYLQSWJHQnKvQZR1S3qsyIwFWZU1hvVbQEQI=; b=tXUToWs84J05NvC/58V0NMaGN48v0pKDTheSKR2OxU2CUS3WaH4r6zslltZ8VU7Iqx wx+Tz9Znm+4r02JGKPPidsAvcj16QX3wB4XPC7eB8o7lLF7A5lnIhYEagDjrOt6kGG0D H5Tr46OGWiendol27eESdKp5xHtjasI97OW0X5RwbvNOsCWeMqro1bvV3jdQJWNy1U0T rGg2/ubKoURT7aMyyZQmz44g4twfozjdHAbwNz55BpBo48J5TwzWGXRlGWg39hbfMxZx eYZeK/aZOff3d1sWeLql2PKs+/rA6XzcGljcCrxV7q5FA407TZjwVvSAkRqSPZmk5Cng 9t2Q== X-Gm-Message-State: AOAM5302x/qWY5pw0Aobb+X8hi7yzKhXqS0nOpHjPO9nSFUzzxSDlFxx 2s18TvhG4xiW5A7cs5+wgFusxMtwXvDkbA9IwlLgZA== X-Google-Smtp-Source: ABdhPJw3vIj4zXjcpcg1ecWNLorcHEfDYiXXrydRw3ssSeDsyU/dDVMnhCKZajwnbEUCEt6+vY5iKLQ4C0gGKaSUlCg= X-Received: by 2002:a05:6a00:cd3:b029:27f:c296:4a5d with SMTP id b19-20020a056a000cd3b029027fc2964a5dmr12513269pfv.38.1620422515015; Fri, 07 May 2021 14:21:55 -0700 (PDT) MIME-Version: 1.0 From: Mina Almasry Date: Fri, 7 May 2021 14:21:44 -0700 Message-ID: Subject: resv_huge_page underflow with userfaultfd test To: Mike Kravetz , Linux-MM , Axel Rasmussen , aarcange@redhat.com, peterx@redhat.com Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=lvijPPd9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of almasrymina@google.com designates 209.85.215.181 as permitted sender) smtp.mailfrom=almasrymina@google.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C960E407F8E8 X-Stat-Signature: zwkbept49owk4awt5dmjzq74to8qhcf7 Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf26; identity=mailfrom; envelope-from=""; helo=mail-pg1-f181.google.com; client-ip=209.85.215.181 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1620422503-640872 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi folks, I ran into a bug that I'm not sure how to solve so I'm wondering if anyone has suggestions on what the issue could be and how to investigate. I added the WARN_ON_ONCE() here to catch instances of resv_huge_pages underflowing: [ 11.163413] Modules linked in: [ 11.163419] CPU: 0 PID: 237 Comm: userfaultfd Not tainted 5.12.0-dbg-DEV #135 [ 11.163424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [ 11.163429] RIP: 0010:alloc_huge_page+0x558/0x5a0 [ 11.163432] Code: b0 00 0f 85 3d ff ff ff e9 2a ff ff ff be 01 00 00 00 48 89 df e8 18 e7 ff ff 48 f7 d8 4c 89 ef 48 89 c6 e8 da d7 ff ff eb 8c <0f> 0b 4d 8b 85 c0 00 00 00 e9 95 fd ff ff e8 35 59 84 00 4c 897 [ 11.163434] RSP: 0018:ffff94bb0073fc80 EFLAGS: 00010046 [ 11.163436] RAX: 0000000000000080 RBX: 0000000000000000 RCX: 5fa252c406a76700 [ 11.163438] RDX: c0000000ffff7fff RSI: 0000000000000004 RDI: 0000000000017ffd [ 11.163439] RBP: ffff94bb0073fcf8 R08: 0000000000000000 R09: ffffffff9813ba70 [ 11.163440] R10: 00000000ffff7fff R11: 0000000000000000 R12: ffff8ac7800558c8 [ 11.163442] R13: ffffffff993f8880 R14: 00007f0dfa200000 R15: ffffed85453e0000 [ 11.163443] FS: 00007f0d731fc700(0000) GS:ffff8acba9400000(0000) knlGS:0000000000000000 [ 11.163445] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 11.163448] CR2: 00007f0e65e00028 CR3: 0000000108d50003 CR4: 0000000000370ef0 [ 11.163452] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 11.163453] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 11.163455] Call Trace: [ 11.163468] hugetlb_mcopy_atomic_pte+0xcb/0x450 [ 11.163477] mcopy_atomic+0xa08/0xd60 [ 11.163480] ? __might_fault+0x56/0x80 [ 11.163493] userfaultfd_ioctl+0xb18/0xd60 [ 11.163502] __se_sys_ioctl+0x77/0xc0 [ 11.163507] __x64_sys_ioctl+0x1d/0x20 [ 11.163510] do_syscall_64+0x3f/0x80 [ 11.163515] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 11.163519] RIP: 0033:0x45ec87 [ 11.163531] Code: 3c 1c 48 f7 d8 49 39 c4 72 b8 e8 64 63 03 00 85 c0 78 bd 48 83 c4 08 4c 89 e0 5b 41 5c c3 0f 1f 44 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 018 [ 11.163532] RSP: 002b:00007f0d731fc248 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [ 11.163534] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000045ec87 [ 11.163536] RDX: 00007f0d731fc290 RSI: 00000000c028aa03 RDI: 0000000000000004 [ 11.163537] RBP: 00007f0d731fc270 R08: 00000000004022b3 R09: 00007f0d731fc700 [ 11.163538] R10: 00007f0d731fc9d0 R11: 0000000000000206 R12: 00007fff610cd82e [ 11.163539] R13: 00007fff610cd82f R14: 00007f0d731fc400 R15: 0000000001002000 [ 11.163549] irq event stamp: 722 [ 11.163550] hardirqs last enabled at (721): [] kmem_cache_alloc_trace+0x1db/0x370 [ 11.163558] hardirqs last disabled at (722): [] _raw_spin_lock_irq+0x32/0x80 [ 11.163560] softirqs last enabled at (130): [] __irq_exit_rcu+0xf6/0x100 [ 11.163564] softirqs last disabled at (125): [] __irq_exit_rcu+0xf6/0x100 [ 11.163567] ---[ end trace 358ac5c76c211ea1 ]--- Debugging further I find the resv_huge_pages underflows by 1 temporarily during the run of the test multiple times, but a __free_huge_page() is always subsequently called that overflows it back to 0. resv_huge_pages is always 0 at the end of the test. I've initially looked at this as I suspected a problem in the resv_huge_pages accounting, but seems the resv_huge_pages accounting is fine in itself as it correctly decrements resv_huge_pages when a page is allocated from reservation and correctly increments it back up when that page is freed. I'm not that familiar with the userfaultfd/hugetlb code so I was hoping to solicit some suggestions for what the issue could be. Things I've tried so far: - Adding code that prevents resv_huge_pages to underflow causes the test to fail, so it seems in this test the calling code actually expects to be able to temporarily allocate 1 more page than the VMA has reserved, which seems like a bug maybe? - Modifying hugetlb_mcopy_atomic_pte() to not use reserved pages causes the test to fail again. Doin that and overprovisioning /proc/sys/vm/nr_hugepages causes the test to pass again but I'm not sure that's right (not familiar with the code). - The failure gets reproduced as far back as 5.11, so it doesn't seem to be related to any recent changes. Thanks in advance! diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 629aa4c2259c..7d763eed650f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1165,7 +1165,21 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask); if (page && !avoid_reserve && vma_has_reserves(vma, chg)) { SetHPageRestoreReserve(page); + WARN_ON_ONCE(!h->resv_huge_pages); h->resv_huge_pages--; } And ran the userfaultfd selftests like so: echo 1024 > /proc/sys/vm/nr_hugepages mkdir -p /mnt/huge mount -t hugetlbfs none /mnt/huge ./tools/testings/selftests/vm/userfaultfd hugetlb_shared 1024 200 /mnt/huge/userfaultfd_test And run into this warning indicating this test does discover an underflow: [ 11.163403] ------------[ cut here ]------------ [ 11.163404] WARNING: CPU: 0 PID: 237 at mm/hugetlb.c:1178 alloc_huge_page+0x558/0x5a0