From patchwork Fri Aug 26 22:03:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12956665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24E76C0502E for ; Fri, 26 Aug 2022 22:03:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 995B26B0074; Fri, 26 Aug 2022 18:03:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 96D7A6B0075; Fri, 26 Aug 2022 18:03:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80B09940007; Fri, 26 Aug 2022 18:03:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7040B6B0074 for ; Fri, 26 Aug 2022 18:03:37 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 493C7120FE2 for ; Fri, 26 Aug 2022 22:03:37 +0000 (UTC) X-FDA: 79843121274.20.42A80D0 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf14.hostedemail.com (Postfix) with ESMTP id D23DC10001C for ; Fri, 26 Aug 2022 22:03:36 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id lx1-20020a17090b4b0100b001fd720458c3so737828pjb.1 for ; Fri, 26 Aug 2022 15:03:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc; bh=ZfMjBnU9jBash3JYHUYPEIOkw8Dh5mtl2yjBQXbzd8w=; b=lRCYS8VfUdwm2C4R1gez0VWa0kSo8O/qaTBR2zSmsfSTaXvVHg4DnAyzgeb2xZvEv3 xtWNe8PvzvdfQhF4mGp4bS2PzSXXw0ym1JNep/Xmoz2IV74exGjjNlE2YoLQNf85P5ZG ueqLdHFEpaN7QOjnctgb3/iF2Yemk5blhIU30bbA2S3hjYdslgKKo0PHEWp7GjB9qRlq PjZ0pd9xdfAC/TfooBjTcJYH5wfe6TTDgGR1bJz5zdvq/fMcmQDlV4pzBMXxAh3bON4t vXCa06t5QWRkVgMiWBSedGfN4tlY/kjX2qEVvvZoNjO56bi+mIGU6Xe5w8D2UxFs8OnG Jdkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc; bh=ZfMjBnU9jBash3JYHUYPEIOkw8Dh5mtl2yjBQXbzd8w=; b=gnw5cl5R6YlRdd2TSQ5Dsv9x/+Fo1NAy0GjS970sZpBgxFfMhKOPLl/Nj8GK9b/R4+ O4ojzZhs1S9A1IDvVle25YNgcmCFAjqHEvGtiNF2LdtMfAoqDE3tc3WdbGsdc/aV+hdW zM7TUCs553OYTHgaSrMbttVam3MQE+h3NHvTJPaDT9fVg9qVmfY6xlopAPVgewuYEEl9 yHw8M77TbVRuhJzsA19DXZTwddEpxdmCAh7JqlBSozq6cj3nY+7MDVi60AQ7nOx8vKWt dId4uSONfSxqJy47rj1LSXOcM59mNiOqrkWuGjoHjf+BeXrEbRmcaczICcFDG5r0Ne/L /MjA== X-Gm-Message-State: ACgBeo1uqtXm3KUBEuTnerEcfcucqsGPpaLxeEbbBfPzG5hxTSqk0Prx bG1SgrwnA/SRoohRLTWi5b00bJ9eR9NQg8Z7vRwP/wJFziffegBQgC4Pi4bVj2jnr/NYNhjKYGt GF+SSi10ULCRDs8ek1n6byPaGsZLSxUB5R39shb0N0tugfOOCFtY4qwdNXPs= X-Google-Smtp-Source: AA6agR4kvG2FLnY/ao//0NEq9QlaJoZJpTEl6AOSVFU/LTgmCWARNCcwmtvWZCAGEYYadyxUyGcBnV7sXIE3 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:2a04:b0:536:9d21:c177 with SMTP id ce4-20020a056a002a0400b005369d21c177mr5918144pfb.4.1661551415711; Fri, 26 Aug 2022 15:03:35 -0700 (PDT) Date: Fri, 26 Aug 2022 15:03:20 -0700 In-Reply-To: <20220826220329.1495407-1-zokeefe@google.com> Mime-Version: 1.0 References: <20220826220329.1495407-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220826220329.1495407-2-zokeefe@google.com> Subject: [PATCH mm-unstable v2 1/9] mm/shmem: add flag to enforce shmem THP in hugepage_vma_check() From: "Zach O'Keefe" To: linux-mm@kvack.org Cc: Andrew Morton , linux-api@vger.kernel.org, Axel Rasmussen , James Houghton , Hugh Dickins , Yang Shi , Miaohe Lin , David Hildenbrand , David Rientjes , Matthew Wilcox , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Chris Kennelly , "Kirill A. Shutemov" , Minchan Kim , Patrick Xia , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661551416; a=rsa-sha256; cv=none; b=T1Z31T/mHnLgAvrLAshyeY/tm4Ry+f+Ma9/jP6c92tehmvDjrIgsPEIBtzWfgpE8Twek7/ V6ZnWK9ukN9BztpoZ4PvAOdRVO1jm2suhx4oYo6wk+qtoGA03heVc21RJwECs0K/EorVM1 UKSq9HjQh4Heeb9O043DSAySyQG7a3s= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=lRCYS8Vf; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of 3N0MJYwcKCAI1qmgghgiqqing.eqonkpwz-oomxcem.qti@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3N0MJYwcKCAI1qmgghgiqqing.eqonkpwz-oomxcem.qti@flex--zokeefe.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661551416; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZfMjBnU9jBash3JYHUYPEIOkw8Dh5mtl2yjBQXbzd8w=; b=FYI04ceyQINc6VWBxRYYuFnfiJDSFHcwu1tOyMCaQS6XpVsurfKofphgeFHZ0PP1TdytNh QY/65c6Hqw1Fmau0nOytNFXCSwDrWxTsdpJgAcCbuDMOzF/byh4UCbj6ist4sHrubn1T3D 9iY95LA4Z5o3BVSWZtel2/16L8931WU= Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=lRCYS8Vf; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of 3N0MJYwcKCAI1qmgghgiqqing.eqonkpwz-oomxcem.qti@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3N0MJYwcKCAI1qmgghgiqqing.eqonkpwz-oomxcem.qti@flex--zokeefe.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: r6hkuxmf83gsn7iss3azsraf5jqe6sx6 X-Rspamd-Queue-Id: D23DC10001C X-HE-Tag: 1661551416-410886 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Extend 'mm/thp: add flag to enforce sysfs THP in hugepage_vma_check()' to shmem, allowing callers to ignore /sys/kernel/transparent_hugepage/shmem_enabled and tmpfs huge= mount. This is intended to be used by MADV_COLLAPSE, and the rationale is analogous to the anon/file case: MADV_COLLAPSE is not coupled to directives that advise the kernel's decisions on when THPs should be considered eligible. shmem/tmpfs always claims large folio support, regardless of sysfs or mount options. Signed-off-by: Zach O'Keefe --- include/linux/shmem_fs.h | 10 ++++++---- mm/huge_memory.c | 2 +- mm/shmem.c | 18 +++++++++--------- 3 files changed, 16 insertions(+), 14 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index ff0b990de83d..f5e9b01dbf4c 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -92,11 +92,13 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping, extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end); int shmem_unuse(unsigned int type); -extern bool shmem_is_huge(struct vm_area_struct *vma, - struct inode *inode, pgoff_t index); -static inline bool shmem_huge_enabled(struct vm_area_struct *vma) +extern bool shmem_is_huge(struct vm_area_struct *vma, struct inode *inode, + pgoff_t index, bool shmem_huge_force); +static inline bool shmem_huge_enabled(struct vm_area_struct *vma, + bool shmem_huge_force) { - return shmem_is_huge(vma, file_inode(vma->vm_file), vma->vm_pgoff); + return shmem_is_huge(vma, file_inode(vma->vm_file), vma->vm_pgoff, + shmem_huge_force); } extern unsigned long shmem_swap_usage(struct vm_area_struct *vma); extern unsigned long shmem_partial_swap_usage(struct address_space *mapping, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 88d98241a635..b3acc8e3046d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -119,7 +119,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, * own flags. */ if (!in_pf && shmem_file(vma->vm_file)) - return shmem_huge_enabled(vma); + return shmem_huge_enabled(vma, !enforce_sysfs); /* Enforce sysfs THP requirements as necessary */ if (enforce_sysfs && diff --git a/mm/shmem.c b/mm/shmem.c index 42e5888bf84d..b9bab1abf142 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -472,20 +472,20 @@ static bool shmem_confirm_swap(struct address_space *mapping, static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; -bool shmem_is_huge(struct vm_area_struct *vma, - struct inode *inode, pgoff_t index) +bool shmem_is_huge(struct vm_area_struct *vma, struct inode *inode, + pgoff_t index, bool shmem_huge_force) { loff_t i_size; if (!S_ISREG(inode->i_mode)) return false; - if (shmem_huge == SHMEM_HUGE_DENY) - return false; if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) || test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))) return false; - if (shmem_huge == SHMEM_HUGE_FORCE) + if (shmem_huge == SHMEM_HUGE_FORCE || shmem_huge_force) return true; + if (shmem_huge == SHMEM_HUGE_DENY) + return false; switch (SHMEM_SB(inode->i_sb)->huge) { case SHMEM_HUGE_ALWAYS: @@ -680,8 +680,8 @@ static long shmem_unused_huge_count(struct super_block *sb, #define shmem_huge SHMEM_HUGE_DENY -bool shmem_is_huge(struct vm_area_struct *vma, - struct inode *inode, pgoff_t index) +bool shmem_is_huge(struct vm_area_struct *vma, struct inode *inode, + pgoff_t index, bool shmem_huge_force) { return false; } @@ -1069,7 +1069,7 @@ static int shmem_getattr(struct user_namespace *mnt_userns, STATX_ATTR_NODUMP); generic_fillattr(&init_user_ns, inode, stat); - if (shmem_is_huge(NULL, inode, 0)) + if (shmem_is_huge(NULL, inode, 0, false)) stat->blksize = HPAGE_PMD_SIZE; if (request_mask & STATX_BTIME) { @@ -1910,7 +1910,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, return 0; } - if (!shmem_is_huge(vma, inode, index)) + if (!shmem_is_huge(vma, inode, index, false)) goto alloc_nohuge; huge_gfp = vma_thp_gfp_mask(vma); From patchwork Fri Aug 26 22:03:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12956666 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 926E6C0502A for ; Fri, 26 Aug 2022 22:03:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06CF86B0075; Fri, 26 Aug 2022 18:03:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EEBA26B0078; Fri, 26 Aug 2022 18:03:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8A2B940007; Fri, 26 Aug 2022 18:03:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C98906B0075 for ; Fri, 26 Aug 2022 18:03:38 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9A64FA1018 for ; Fri, 26 Aug 2022 22:03:38 +0000 (UTC) X-FDA: 79843121316.14.AFE4C2B Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf25.hostedemail.com (Postfix) with ESMTP id 531AAA0030 for ; Fri, 26 Aug 2022 22:03:38 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id 36-20020a17090a0fa700b001fd64c962afso1333661pjz.5 for ; Fri, 26 Aug 2022 15:03:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc; bh=7fZ5HILtG3kfI60e7wcW/PIym4PFzvtnF5Uve7FWkPI=; b=p0hsfxqgbV2dQ/5hCSKETRTjued+0EaoD/3V55MfV0RmxDpc4KfZTuJWjxQTpx99st pxAEmB9kUvR+AmsxfvghoRc8a/zJM14by3qHHv5IIkmKN3L/vk9w/D306bziURD/6Hki qgTMODw50WoKz94TrdmBPhLGhvxWEtiCulht/6kiCpyeq6xywwm1ESU+R5q9S4QrgLQK tEW5wOnHze4GHKdIjMC8ocQlUXmROC0rSMpGKuZe9FzzMFBGEgfay8kuWj/sHjJxtj+G vnBMckN/WmHj+JPjqcgaNW6ulATKRCyGtuI4uzhK/Inr2GWmxpYj03jCD0ILjz+ROqSP 2zkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc; bh=7fZ5HILtG3kfI60e7wcW/PIym4PFzvtnF5Uve7FWkPI=; b=aNuTq+0QdACTQXkpTOh2vYUGKdl709XS/XPNbLSlIERpZhHftz9YYW/fFPWNRsNpwG +mT3q2VG7Xkbz81Oq8Sh8fGoIjdNXxESUx2KOFV+fMbCSCUs0areiDXFFyE3pBCcJ1tj lHg6bUkjBmhAW8yfRQu++1kZeRIzIq2Xl3bFP3D/rMbprxyp23MtHqpfw8ZKR+8/QzZs L70z1GvPp2qgEpE4KFWrnDKRZVkM8898BFI3Th0Bvy2hegIhrFi+WgabAAGVAhxwoFeN 1KKtWhUuQAWFLfuCq+lbn8FiCNAJDGq7rVMO6wwOmm3U4Xg7Th3lLeIlGjgEtjQFHwcK rQLA== X-Gm-Message-State: ACgBeo0MENUO6U/gFpyAKOMXVloAEu6qq5BoJE1yAtxGBNwVQ3P9CDMs uQHuMdylpBO7pyJ1gC+AeBKc03N02Rm7CZLUmmrw9F49AfR5osI2wuoqTe+RRWxxCCMoOylGLD0 Om2ekyWo6f31IDX/XUJpc/dKSIn3Va4DtJh8bBXCNk7Hwwahz0FXNVOx4svk= X-Google-Smtp-Source: AA6agR5lAaXB3xAaTmvjxHyDnTyBZEmH01vI89GGomOXxhDxIgMXUdPHILDhJa8R/p05mHWcZDIUSWsVfNud X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:612:b0:1fd:5ec1:6c74 with SMTP id gb18-20020a17090b061200b001fd5ec16c74mr5372174pjb.221.1661551417230; Fri, 26 Aug 2022 15:03:37 -0700 (PDT) Date: Fri, 26 Aug 2022 15:03:21 -0700 In-Reply-To: <20220826220329.1495407-1-zokeefe@google.com> Mime-Version: 1.0 References: <20220826220329.1495407-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220826220329.1495407-3-zokeefe@google.com> Subject: [PATCH mm-unstable v2 2/9] mm/khugepaged: attempt to map file/shmem-backed pte-mapped THPs by pmds From: "Zach O'Keefe" To: linux-mm@kvack.org Cc: Andrew Morton , linux-api@vger.kernel.org, Axel Rasmussen , James Houghton , Hugh Dickins , Yang Shi , Miaohe Lin , David Hildenbrand , David Rientjes , Matthew Wilcox , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Chris Kennelly , "Kirill A. Shutemov" , Minchan Kim , Patrick Xia , "Zach O'Keefe" ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=p0hsfxqg; spf=pass (imf25.hostedemail.com: domain of 3OUMJYwcKCAQ3soiijiksskpi.gsqpmry1-qqozego.svk@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3OUMJYwcKCAQ3soiijiksskpi.gsqpmry1-qqozego.svk@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661551418; a=rsa-sha256; cv=none; b=h0sV5s4JdX1WSYFpsjpK8el8btsw64cWb5Jbpd3O6tG68x5wtXhOeDhXBq0hH4nK/9o0ii SIiRV9xC7znlbTE8i4f0qQo9fyhGlFMEh/MJhS7T2lebNdDf5SThtN8GuoX+fKq03KVmKT B1WDOuZsYv8IScQrmyw/8reol+pzxJ0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661551418; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7fZ5HILtG3kfI60e7wcW/PIym4PFzvtnF5Uve7FWkPI=; b=Z3jIVFV7PSpl/Bp8I2YzCcd7e1GQcZOJjipU7ItvQ2x10Plr5HV30TmrP+lo6DtWEOM+2v jH5nf5eDS1VnZSKyOufLyqh/kEgHuJm/ouEpC/LBREBJjqnqu6DIHdiVEnUFau5L8iN2tS +XDWBIxL+xr6mpDp5oOMeYzIzMZWzlY= X-Rspam-User: X-Stat-Signature: p514nuxxkqzjtmkuaoegrywjy1qge4rh X-Rspamd-Queue-Id: 531AAA0030 Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=p0hsfxqg; spf=pass (imf25.hostedemail.com: domain of 3OUMJYwcKCAQ3soiijiksskpi.gsqpmry1-qqozego.svk@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3OUMJYwcKCAQ3soiijiksskpi.gsqpmry1-qqozego.svk@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam09 X-HE-Tag: 1661551418-899221 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The main benefit of THPs are that they can be mapped at the pmd level, increasing the likelihood of TLB hit and spending less cycles in page table walks. pte-mapped hugepages - that is - hugepage-aligned compound pages of order HPAGE_PMD_ORDER - although being contiguous in physical memory, don't have this advantage. In fact, one could argue they are detrimental to system performance overall since they occupy a precious hugepage-aligned/sized region of physical memory that could otherwise be used more effectively. Additionally, pte-mapped hugepages can be the cheapest memory to collapse for khugepaged since no new hugepage allocation or copying of memory contents is necessary - we only need to update the mapping page tables. In the anonymous collapse path, we are able to collapse pte-mapped hugepages (albeit, perhaps suboptimally), but the file/shmem path makes no effort when compound pages (of any order) are encountered. Identify pte-mapped hugepages in the file/shmem collapse path. In khugepaged context, attempt to update page tables mapping this hugepage. Note that these collapses still count towards the /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed counter, and if the pte-mapped hugepage was also mapped into multiple process' address spaces, could be incremented for each page table update. Since we increment the counter when a pte-mapped hugepage is successfully added to the list of to-collapse pte-mapped THPs, it's possible that we never actually update the page table either. This is different from how file/shmem pages_collapsed accounting works today where only a successful page cache update is counted (it's also possible here that no page tables are actually changed). Though it incurs some slop, this is preferred to either not accounting for the event at all, or plumbing through data in struct mm_slot on whether to account for the collapse or not. Note that work still needs to be done to support arbitrary compound pages, and that this should all be converted to using folios. Signed-off-by: Zach O'Keefe --- include/trace/events/huge_memory.h | 1 + mm/khugepaged.c | 49 ++++++++++++++++++++++++++---- 2 files changed, 44 insertions(+), 6 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 55392bf30a03..fbbb25494d60 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -17,6 +17,7 @@ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \ EM( SCAN_PTE_UFFD_WP, "pte_uffd_wp") \ + EM( SCAN_PTE_MAPPED_HUGEPAGE, "pte_mapped_hugepage") \ EM( SCAN_PAGE_RO, "no_writable_page") \ EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \ EM( SCAN_PAGE_NULL, "page_null") \ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d8e388106322..6022a08db1cd 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -34,6 +34,7 @@ enum scan_result { SCAN_EXCEED_SHARED_PTE, SCAN_PTE_NON_PRESENT, SCAN_PTE_UFFD_WP, + SCAN_PTE_MAPPED_HUGEPAGE, SCAN_PAGE_RO, SCAN_LACK_REFERENCED_PAGE, SCAN_PAGE_NULL, @@ -1349,18 +1350,22 @@ static void collect_mm_slot(struct mm_slot *mm_slot) * Notify khugepaged that given addr of the mm is pte-mapped THP. Then * khugepaged should try to collapse the page table. */ -static void khugepaged_add_pte_mapped_thp(struct mm_struct *mm, +static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) { struct mm_slot *mm_slot; + bool ret = false; VM_BUG_ON(addr & ~HPAGE_PMD_MASK); spin_lock(&khugepaged_mm_lock); mm_slot = get_mm_slot(mm); - if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP)) + if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP)) { mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr; + ret = true; + } spin_unlock(&khugepaged_mm_lock); + return ret; } static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *vma, @@ -1397,9 +1402,16 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) pte_t *start_pte, *pte; pmd_t *pmd; spinlock_t *ptl; - int count = 0; + int count = 0, result = SCAN_FAIL; int i; + mmap_assert_write_locked(mm); + + /* Fast check before locking page if already PMD-mapped */ + result = find_pmd_or_thp_or_none(mm, haddr, &pmd); + if (result != SCAN_SUCCEED) + return; + if (!vma || !vma->vm_file || !range_in_vma(vma, haddr, haddr + HPAGE_PMD_SIZE)) return; @@ -1748,7 +1760,11 @@ static int collapse_file(struct mm_struct *mm, struct file *file, * we locked the first page, then a THP might be there already. */ if (PageTransCompound(page)) { - result = SCAN_PAGE_COMPOUND; + result = compound_order(page) == HPAGE_PMD_ORDER && + index == start + /* Maybe PMD-mapped */ + ? SCAN_PTE_MAPPED_HUGEPAGE + : SCAN_PAGE_COMPOUND; goto out_unlock; } @@ -1986,7 +2002,11 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, * into a PMD sized page */ if (PageTransCompound(page)) { - result = SCAN_PAGE_COMPOUND; + result = compound_order(page) == HPAGE_PMD_ORDER && + xas.xa_index == start + /* Maybe PMD-mapped */ + ? SCAN_PTE_MAPPED_HUGEPAGE + : SCAN_PAGE_COMPOUND; break; } @@ -2046,6 +2066,12 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) { } + +static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, + unsigned long addr) +{ + return false; +} #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, @@ -2137,8 +2163,19 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, &mmap_locked, cc); } - if (*result == SCAN_SUCCEED) + switch (*result) { + case SCAN_PTE_MAPPED_HUGEPAGE: + if (!khugepaged_add_pte_mapped_thp(mm, + khugepaged_scan.address)) + break; + fallthrough; + case SCAN_SUCCEED: ++khugepaged_pages_collapsed; + break; + default: + break; + } + /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; progress += HPAGE_PMD_NR; From patchwork Fri Aug 26 22:03:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12956667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20E21C0502C for ; Fri, 26 Aug 2022 22:03:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B1AC16B0078; Fri, 26 Aug 2022 18:03:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AFABA940007; Fri, 26 Aug 2022 18:03:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B9336B007D; Fri, 26 Aug 2022 18:03:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 86B9E6B0078 for ; Fri, 26 Aug 2022 18:03:40 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 605FE1409AE for ; Fri, 26 Aug 2022 22:03:40 +0000 (UTC) X-FDA: 79843121400.21.76FFFDD Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf27.hostedemail.com (Postfix) with ESMTP id C1D1B40034 for ; Fri, 26 Aug 2022 22:03:39 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-33dbe61eed8so45577217b3.1 for ; Fri, 26 Aug 2022 15:03:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc; bh=uoLHuTHWGtpRcWblUHfwJ1kEF4uzDkIb3Usv8ENss+o=; b=DowK/uW1l42zjDXDWpVsaUaBe/pQt/2QvgUlujrdam7rF7QlgCBqYJUxJVyjHnqQ1G RliPHobLD+5z+drk7Cq0i46ETm5f9Irw/S0bhKT6VtOxUPl1tHYUDSfbrjJc5JJ9PDeI h/elS9pAChVo6DyObNJ6bontlOhfPS0Te/VN7zwX0ISlrvGSCe92FoH9mcyYmpzuER45 rnBIoY2n1/Qk28dbOqXNb/1vopc8hD4UnaDSG2mUKzJZ5USvUbqSeCifJM0rxyg9DZvq 5GlunLChZd+810Wp1faDfOUO0bq8QHADf9hQBh/8u/uvAY6vorrW43pk/onVDq5UcKHt XAuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc; bh=uoLHuTHWGtpRcWblUHfwJ1kEF4uzDkIb3Usv8ENss+o=; b=lWccWpgoqJOGaXikJqfJVK9NP2zE1Lb+MEH54RFEoF2eC4gBIC7XWMNwB7lCCudjNY TJQf9SuLOubtCQLAh2aDESL0iGCEAXnSvzo7659DVa6EVTLhZ7+lCPwmQn7J+e2faHZ7 +OE6ce2qZwlg/Q9gUOcaZkOO8Wy/ffQzZbNPKnzKJwvroV6rP71vVkOvVtmb5IibSVU4 MjkHiu6xAHeQYIxf1J66iZHEvEzlnUCvQlrflRlq1KnJgvx9IQ4m9F6lklNqy3+KF7Aa tLxWtNolR3+34Sqo/es2RxHRlk35OHgDDaNLRaYvUqv78ZGvtgulCkHyTM0LMVyfA7Rb q+zw== X-Gm-Message-State: ACgBeo22CVlwgprxB/S/hgrPOE0fE6iz+BWyd6gsi/B6QoABOZQUdFtT usQZ86eDvWUxUmWNDhD1oXV1sisnGpN47UrRC3emUYzzSzux32kbSoROHKCsyjezwkyEx1DLcEx Ust6crADgl6j+s7HTTMhGwUOMv2zaIz+ydmeP9qZs8yyoJftc1IOhMS726c8= X-Google-Smtp-Source: AA6agR5FQ3zJ+ZvktiUERE0h/role58ZFs/Wi1Y862WqbvQnZRliYooHrFmyMD8BafGdCaDkNamZs+M2pwiF X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a0d:d489:0:b0:339:b110:a8d7 with SMTP id w131-20020a0dd489000000b00339b110a8d7mr1826189ywd.308.1661551418923; Fri, 26 Aug 2022 15:03:38 -0700 (PDT) Date: Fri, 26 Aug 2022 15:03:22 -0700 In-Reply-To: <20220826220329.1495407-1-zokeefe@google.com> Mime-Version: 1.0 References: <20220826220329.1495407-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220826220329.1495407-4-zokeefe@google.com> Subject: [PATCH mm-unstable v2 3/9] mm/madvise: add file and shmem support to MADV_COLLAPSE From: "Zach O'Keefe" To: linux-mm@kvack.org Cc: Andrew Morton , linux-api@vger.kernel.org, Axel Rasmussen , James Houghton , Hugh Dickins , Yang Shi , Miaohe Lin , David Hildenbrand , David Rientjes , Matthew Wilcox , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Chris Kennelly , "Kirill A. Shutemov" , Minchan Kim , Patrick Xia , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661551419; a=rsa-sha256; cv=none; b=ro19XppquGVgaKdoTGcPxkhw8w5murIHuMH0Cc6WThDexnvRwrSOoX/yU1dXjo3RWxa27W qkImiFrKR8vJoSqLR1bNsbCWXBPWC2GdYZVzyyeik85wbP0jC2yL7JJDJep/cVPZENa/7E HJB62tRXdgTZN4ovKeQRimqG4JlHi1g= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="DowK/uW1"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of 3OkMJYwcKCAU4tpjjkjlttlqj.htrqnsz2-rrp0fhp.twl@flex--zokeefe.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3OkMJYwcKCAU4tpjjkjlttlqj.htrqnsz2-rrp0fhp.twl@flex--zokeefe.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661551419; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uoLHuTHWGtpRcWblUHfwJ1kEF4uzDkIb3Usv8ENss+o=; b=mHgJLx7PDaB/vMzcd7RS4fuz9I1gQ39+s58tZwt9ZxPojlmQ2xqCzHbJEw8aPnfgSOVC+T jv73w60nfRuYqBo10VS20EVsjC4CobKxQPV5In+xohx/Wr8TpCfVBCT7iG9D534BtK8h+2 mtSS60SPRr8DFb1bUA/WA9rotaD0QJ4= X-Rspamd-Server: rspam11 X-Rspam-User: X-Rspamd-Queue-Id: C1D1B40034 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="DowK/uW1"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of 3OkMJYwcKCAU4tpjjkjlttlqj.htrqnsz2-rrp0fhp.twl@flex--zokeefe.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3OkMJYwcKCAU4tpjjkjlttlqj.htrqnsz2-rrp0fhp.twl@flex--zokeefe.bounces.google.com X-Stat-Signature: bhhuu4w5943i3wsbzmcsx4tauso8o3nz X-HE-Tag: 1661551419-866551 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add support for MADV_COLLAPSE to collapse shmem-backed and file-backed memory into THPs (requires CONFIG_READ_ONLY_THP_FOR_FS=y). On success, the backing memory will be a hugepage. For the memory range and process provided, the page tables will synchronously have a huge pmd installed, mapping the THP. Other mappings of the file extent mapped by the memory range may be added to a set of entries that khugepaged will later process and attempt update their page tables to map the THP by a pmd. This functionality unlocks two important uses: (1) Immediately back executable text by THPs. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which might impair services from serving at their full rated load after (re)starting. Tricks like mremap(2)'ing text onto anonymous memory to immediately realize iTLB performance prevents page sharing and demand paging, both of which increase steady state memory footprint. Now, we can have the best of both worlds: Peak upfront performance and lower RAM footprints. (2) userfaultfd-based live migration of virtual machines satisfy UFFD faults by fetching native-sized pages over the network (to avoid latency of transferring an entire hugepage). However, after guest memory has been fully copied to the new host, MADV_COLLAPSE can be used to immediately increase guest performance. Signed-off-by: Zach O'Keefe --- include/linux/khugepaged.h | 13 +- include/trace/events/huge_memory.h | 1 + kernel/events/uprobes.c | 2 +- mm/khugepaged.c | 241 ++++++++++++++++++++++------- 4 files changed, 198 insertions(+), 59 deletions(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index 384f034ae947..70162d707caf 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -16,11 +16,13 @@ extern void khugepaged_enter_vma(struct vm_area_struct *vma, unsigned long vm_flags); extern void khugepaged_min_free_kbytes_update(void); #ifdef CONFIG_SHMEM -extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr); +extern int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, + bool install_pmd); #else -static inline void collapse_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) +static inline int collapse_pte_mapped_thp(struct mm_struct *mm, + unsigned long addr, bool install_pmd) { + return 0; } #endif @@ -46,9 +48,10 @@ static inline void khugepaged_enter_vma(struct vm_area_struct *vma, unsigned long vm_flags) { } -static inline void collapse_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) +static inline int collapse_pte_mapped_thp(struct mm_struct *mm, + unsigned long addr, bool install_pmd) { + return 0; } static inline void khugepaged_min_free_kbytes_update(void) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index fbbb25494d60..a8db658e99e9 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -11,6 +11,7 @@ EM( SCAN_FAIL, "failed") \ EM( SCAN_SUCCEED, "succeeded") \ EM( SCAN_PMD_NULL, "pmd_null") \ + EM( SCAN_PMD_NON_PRESENT, "pmd_non_present") \ EM( SCAN_PMD_MAPPED, "page_pmd_mapped") \ EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 401bc2d24ce0..d48c47811e45 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -553,7 +553,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, /* try collapse pmd for compound page */ if (!ret && orig_page_huge) - collapse_pte_mapped_thp(mm, vaddr); + collapse_pte_mapped_thp(mm, vaddr, false); return ret; } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6022a08db1cd..34c0c74b3839 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -28,6 +28,7 @@ enum scan_result { SCAN_FAIL, SCAN_SUCCEED, SCAN_PMD_NULL, + SCAN_PMD_NON_PRESENT, SCAN_PMD_MAPPED, SCAN_EXCEED_NONE_PTE, SCAN_EXCEED_SWAP_PTE, @@ -870,6 +871,18 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, if (!hugepage_vma_check(vma, vma->vm_flags, false, false, cc->is_khugepaged)) return SCAN_VMA_CHECK; + return SCAN_SUCCEED; +} + +static int hugepage_vma_revalidate_anon(struct mm_struct *mm, + unsigned long address, + struct vm_area_struct **vmap, + struct collapse_control *cc) +{ + int ret = hugepage_vma_revalidate(mm, address, vmap, cc); + + if (ret != SCAN_SUCCEED) + return ret; /* * Anon VMA expected, the address may be unmapped then * remapped to file after khugepaged reaquired the mmap_lock. @@ -877,8 +890,8 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, * hugepage_vma_check may return true for qualified file * vmas. */ - if (!vma->anon_vma || !vma_is_anonymous(vma)) - return SCAN_VMA_CHECK; + if (!(*vmap)->anon_vma || !vma_is_anonymous(*vmap)) + return SCAN_PAGE_ANON; return SCAN_SUCCEED; } @@ -899,7 +912,7 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm, barrier(); #endif if (!pmd_present(pmde)) - return SCAN_PMD_NULL; + return SCAN_PMD_NON_PRESENT; if (pmd_trans_huge(pmde)) return SCAN_PMD_MAPPED; if (pmd_bad(pmde)) @@ -1027,7 +1040,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, goto out_nolock; mmap_read_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma, cc); + result = hugepage_vma_revalidate_anon(mm, address, &vma, cc); if (result != SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1058,7 +1071,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, * handled by the anon_vma lock + PG_lock. */ mmap_write_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma, cc); + result = hugepage_vma_revalidate_anon(mm, address, &vma, cc); if (result != SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -1361,13 +1374,44 @@ static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, spin_lock(&khugepaged_mm_lock); mm_slot = get_mm_slot(mm); if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP)) { + int i; + /* + * Multiple callers may be adding entries here. Do a quick + * check to see the entry hasn't already been added by someone + * else. + */ + for (i = 0; i < mm_slot->nr_pte_mapped_thp; ++i) + if (mm_slot->pte_mapped_thp[i] == addr) + goto out; mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr; ret = true; } +out: spin_unlock(&khugepaged_mm_lock); return ret; } +/* hpage must be locked, and mmap_lock must be held in write */ +static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmdp, struct page *hpage) +{ + struct vm_fault vmf = { + .vma = vma, + .address = addr, + .flags = 0, + .pmd = pmdp, + }; + + VM_BUG_ON(!PageTransHuge(hpage)); + mmap_assert_write_locked(vma->vm_mm); + + if (do_set_pmd(&vmf, hpage)) + return SCAN_FAIL; + + get_page(hpage); + return SCAN_SUCCEED; +} + static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp) { @@ -1389,12 +1433,14 @@ static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *v * * @mm: process address space where collapse happens * @addr: THP collapse address + * @install_pmd: If a huge PMD should be installed * * This function checks whether all the PTEs in the PMD are pointing to the * right THP. If so, retract the page table so the THP can refault in with - * as pmd-mapped. + * as pmd-mapped. Possibly install a huge PMD mapping the THP. */ -void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) +int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, + bool install_pmd) { unsigned long haddr = addr & HPAGE_PMD_MASK; struct vm_area_struct *vma = vma_lookup(mm, haddr); @@ -1409,12 +1455,12 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) /* Fast check before locking page if already PMD-mapped */ result = find_pmd_or_thp_or_none(mm, haddr, &pmd); - if (result != SCAN_SUCCEED) - return; + if (result == SCAN_PMD_MAPPED) + return result; if (!vma || !vma->vm_file || !range_in_vma(vma, haddr, haddr + HPAGE_PMD_SIZE)) - return; + return SCAN_VMA_CHECK; /* * If we are here, we've succeeded in replacing all the native pages @@ -1424,24 +1470,44 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) * analogously elide sysfs THP settings here. */ if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false)) - return; + return SCAN_VMA_CHECK; /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ if (userfaultfd_wp(vma)) - return; + return SCAN_PTE_UFFD_WP; hpage = find_lock_page(vma->vm_file->f_mapping, linear_page_index(vma, haddr)); if (!hpage) - return; + return SCAN_PAGE_NULL; - if (!PageHead(hpage)) + if (!PageHead(hpage)) { + result = SCAN_FAIL; goto drop_hpage; + } - if (find_pmd_or_thp_or_none(mm, haddr, &pmd) != SCAN_SUCCEED) + if (!PageTransCompound(hpage)) { + result = SCAN_FAIL; goto drop_hpage; + } + + result = find_pmd_or_thp_or_none(mm, haddr, &pmd); + switch (result) { + case SCAN_SUCCEED: + break; + case SCAN_PMD_NON_PRESENT: + /* + * In MADV_COLLAPSE path, possible race with khugepaged where + * all pte entries have been removed and pmd cleared. If so, + * skip all the pte checks and just update the pmd mapping. + */ + goto install_pmd; + default: + goto drop_hpage; + } start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); + result = SCAN_FAIL; /* step 1: check all mapped PTEs are to the right huge page */ for (i = 0, addr = haddr, pte = start_pte; @@ -1453,8 +1519,10 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) continue; /* page swapped out, abort */ - if (!pte_present(*pte)) + if (!pte_present(*pte)) { + result = SCAN_PTE_NON_PRESENT; goto abort; + } page = vm_normal_page(vma, addr, *pte); if (WARN_ON_ONCE(page && is_zone_device_page(page))) @@ -1489,12 +1557,19 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count); } - /* step 4: collapse pmd */ + /* step 4: remove pte entries */ collapse_and_free_pmd(mm, vma, haddr, pmd); + +install_pmd: + /* step 5: install pmd entry */ + result = install_pmd + ? set_huge_pmd(vma, haddr, pmd, hpage) + : SCAN_SUCCEED; + drop_hpage: unlock_page(hpage); put_page(hpage); - return; + return result; abort: pte_unmap_unlock(start_pte, ptl); @@ -1516,22 +1591,29 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) goto out; for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++) - collapse_pte_mapped_thp(mm, mm_slot->pte_mapped_thp[i]); + collapse_pte_mapped_thp(mm, mm_slot->pte_mapped_thp[i], false); out: mm_slot->nr_pte_mapped_thp = 0; mmap_write_unlock(mm); } -static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) +static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, + struct mm_struct *target_mm, + unsigned long target_addr, struct page *hpage, + struct collapse_control *cc) { struct vm_area_struct *vma; - struct mm_struct *mm; - unsigned long addr; - pmd_t *pmd; + int target_result = SCAN_FAIL; i_mmap_lock_write(mapping); vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { + int result = SCAN_FAIL; + struct mm_struct *mm = NULL; + unsigned long addr = 0; + pmd_t *pmd; + bool is_target = false; + /* * Check vma->anon_vma to exclude MAP_PRIVATE mappings that * got written to. These VMAs are likely not worth investing @@ -1548,24 +1630,34 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * ptl. It has higher chance to recover THP for the VMA, but * has higher cost too. */ - if (vma->anon_vma) - continue; + if (vma->anon_vma) { + result = SCAN_PAGE_ANON; + goto next; + } addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); - if (addr & ~HPAGE_PMD_MASK) - continue; - if (vma->vm_end < addr + HPAGE_PMD_SIZE) - continue; + if (addr & ~HPAGE_PMD_MASK || + vma->vm_end < addr + HPAGE_PMD_SIZE) { + result = SCAN_VMA_CHECK; + goto next; + } mm = vma->vm_mm; - if (find_pmd_or_thp_or_none(mm, addr, &pmd) != SCAN_SUCCEED) - continue; + is_target = mm == target_mm && addr == target_addr; + result = find_pmd_or_thp_or_none(mm, addr, &pmd); + if (result != SCAN_SUCCEED) + goto next; /* * We need exclusive mmap_lock to retract page table. * * We use trylock due to lock inversion: we need to acquire * mmap_lock while holding page lock. Fault path does it in * reverse order. Trylock is a way to avoid deadlock. + * + * Also, it's not MADV_COLLAPSE's job to collapse other + * mappings - let khugepaged take care of them later. */ - if (mmap_write_trylock(mm)) { + result = SCAN_PTE_MAPPED_HUGEPAGE; + if ((cc->is_khugepaged || is_target) && + mmap_write_trylock(mm)) { /* * When a vma is registered with uffd-wp, we can't * recycle the pmd pgtable because there can be pte @@ -1574,22 +1666,45 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * it'll always mapped in small page size for uffd-wp * registered ranges. */ - if (!hpage_collapse_test_exit(mm) && - !userfaultfd_wp(vma)) - collapse_and_free_pmd(mm, vma, addr, pmd); + if (hpage_collapse_test_exit(mm)) { + result = SCAN_ANY_PROCESS; + goto unlock_next; + } + if (userfaultfd_wp(vma)) { + result = SCAN_PTE_UFFD_WP; + goto unlock_next; + } + collapse_and_free_pmd(mm, vma, addr, pmd); + if (!cc->is_khugepaged && is_target) + result = set_huge_pmd(vma, addr, pmd, hpage); + else + result = SCAN_SUCCEED; + +unlock_next: mmap_write_unlock(mm); - } else { - /* Try again later */ + goto next; + } + /* + * Calling context will handle target mm/addr. Otherwise, let + * khugepaged try again later. + */ + if (!is_target) { khugepaged_add_pte_mapped_thp(mm, addr); + continue; } +next: + if (is_target) + target_result = result; } i_mmap_unlock_write(mapping); + return target_result; } /** * collapse_file - collapse filemap/tmpfs/shmem pages into huge one. * * @mm: process address space where collapse happens + * @addr: virtual collapse start address * @file: file that collapse on * @start: collapse start address * @cc: collapse context and scratchpad @@ -1609,8 +1724,9 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + restore gaps in the page cache; * + unlock and free huge page; */ -static int collapse_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct collapse_control *cc) +static int collapse_file(struct mm_struct *mm, unsigned long addr, + struct file *file, pgoff_t start, + struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; struct page *hpage; @@ -1912,7 +2028,8 @@ static int collapse_file(struct mm_struct *mm, struct file *file, /* * Remove pte page tables, so we can re-fault the page as huge. */ - retract_page_tables(mapping, start); + result = retract_page_tables(mapping, start, mm, addr, hpage, + cc); unlock_page(hpage); hpage = NULL; } else { @@ -1968,8 +2085,9 @@ static int collapse_file(struct mm_struct *mm, struct file *file, return result; } -static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct collapse_control *cc) +static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, + struct file *file, pgoff_t start, + struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2049,7 +2167,7 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - result = collapse_file(mm, file, start, cc); + result = collapse_file(mm, addr, file, start, cc); } } @@ -2057,8 +2175,9 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, return result; } #else -static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct collapse_control *cc) +static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, + struct file *file, pgoff_t start, + struct collapse_control *cc) { BUILD_BUG(); } @@ -2153,8 +2272,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, khugepaged_scan.address); mmap_read_unlock(mm); - *result = khugepaged_scan_file(mm, file, pgoff, - cc); + *result = hpage_collapse_scan_file(mm, + khugepaged_scan.address, + file, pgoff, cc); mmap_locked = false; fput(file); } else { @@ -2452,10 +2572,6 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, *prev = vma; - /* TODO: Support file/shmem */ - if (!vma->anon_vma || !vma_is_anonymous(vma)) - return -EINVAL; - if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false)) return -EINVAL; @@ -2486,16 +2602,35 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, } mmap_assert_locked(mm); memset(cc->node_load, 0, sizeof(cc->node_load)); - result = hpage_collapse_scan_pmd(mm, vma, addr, &mmap_locked, - cc); + if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) { + struct file *file = get_file(vma->vm_file); + pgoff_t pgoff = linear_page_index(vma, addr); + + mmap_read_unlock(mm); + mmap_locked = false; + result = hpage_collapse_scan_file(mm, addr, file, pgoff, + cc); + fput(file); + } else { + result = hpage_collapse_scan_pmd(mm, vma, addr, + &mmap_locked, cc); + } if (!mmap_locked) *prev = NULL; /* Tell caller we dropped mmap_lock */ +handle_result: switch (result) { case SCAN_SUCCEED: case SCAN_PMD_MAPPED: ++thps; break; + case SCAN_PTE_MAPPED_HUGEPAGE: + BUG_ON(mmap_locked); + BUG_ON(*prev); + mmap_write_lock(mm); + result = collapse_pte_mapped_thp(mm, addr, true); + mmap_write_unlock(mm); + goto handle_result; /* Whitelisted set of results where continuing OK */ case SCAN_PMD_NULL: case SCAN_PTE_NON_PRESENT: From patchwork Fri Aug 26 22:03:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12956668 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D03FCC0502A for ; Fri, 26 Aug 2022 22:03:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 638116B007B; Fri, 26 Aug 2022 18:03:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E8FA940007; Fri, 26 Aug 2022 18:03:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 43BFF6B007E; Fri, 26 Aug 2022 18:03:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 34E4A6B007B for ; Fri, 26 Aug 2022 18:03:42 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 191C5120FB3 for ; Fri, 26 Aug 2022 22:03:42 +0000 (UTC) X-FDA: 79843121484.25.3D64D6A Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf11.hostedemail.com (Postfix) with ESMTP id B66934000C for ; Fri, 26 Aug 2022 22:03:41 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id e16-20020a17090301d000b00172fbf52e7dso1806082plh.11 for ; Fri, 26 Aug 2022 15:03:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc; bh=xLLp8vTNhH59q1o8fD2SQ2kq8Qu3/LkoSxB3VWIQLC0=; b=XvGjn7gHG0Jy2AgL77bVjBMmKvfX3u39kZPEEpJFPH3teQ46Dx+8hErcT5acLyHFxg BiyWrmpM4RizBfuDsPxpBk943ZIjrRWOxEy2zkWBTftKoyNWlj9lf9QklHxMy3vC+xlv V3seD2HOV6EveLNH+xLmeGYqsBn02pZFAvymqrL0OIC7CSiBfl66bGJsjFqYqFiV5zJs QUQpdIUZWpEMSFiB65kNDqOrA5YXNZMKgBOEwShMJZ621cuDotTqKpmUDfeC7N9LO++c oZML7KAJBXdszNb7u7KfIKDVylXzic3HQ+nnItcuZFTHijcTYJdofdihTYG/d05OCaoM p53g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc; bh=xLLp8vTNhH59q1o8fD2SQ2kq8Qu3/LkoSxB3VWIQLC0=; b=unZYUOhI90rXr+C5o1YxfXfmVWyxg7zjbfTW6MU0IJWcW4UUoDp/qYhMJjeFK7IyXR jmf+JM4ko+s6V9w/ATP+mHPzym20ePxvBTT/iAJIKMeBlCDE+hcRnGt2fZ6WjGkRnave b1uGA14QbvqkcBQqODIRqid5mR6mBP5LIkC/Bkm3b4HopxUEjCVS3ueGfN0coP3IF1WZ VEJumvM/iWA3QY7gpwVO8ADhOEvK73bnhk8OMkvfUt7Mo/3PKa9UCS2oNi7B8XM2kWsK FddELm1PFDlij9PzgxSYY+TzaD5H6qgSPwNiuyhcFUuIwZGwKb0+2yTrxg/fDGFrfUKU /JnA== X-Gm-Message-State: ACgBeo0l8/Yd9zTPdEzs1tY6WocAL5C1elW26nh+50FTnbos8XOhWXJz F5w4H2YXrwGuEH5c2JK2LoePr+VulZ4HJFgGlbnoX2w9bdBa5o9oU9FnUzoZ68qEGl/ZsK+jt0I 5ey4ZOb47afA77c8I9P26ZsTWcVqrbhrxHy4A54fFbtiEKxQR0IPwTom1eGw= X-Google-Smtp-Source: AA6agR6DfENa/SqqjaPKPVw6ISWE0peW5BCO0Jlhcvcu4MaLoFQGGb8x4Ck9MTRbtfqdId2FemtcKWQt4S0Q X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:b489:b0:171:5091:d53b with SMTP id y9-20020a170902b48900b001715091d53bmr5690727plr.44.1661551420635; Fri, 26 Aug 2022 15:03:40 -0700 (PDT) Date: Fri, 26 Aug 2022 15:03:23 -0700 In-Reply-To: <20220826220329.1495407-1-zokeefe@google.com> Mime-Version: 1.0 References: <20220826220329.1495407-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220826220329.1495407-5-zokeefe@google.com> Subject: [PATCH mm-unstable v2 4/9] mm/khugepaged: add tracepoint to hpage_collapse_scan_file() From: "Zach O'Keefe" To: linux-mm@kvack.org Cc: Andrew Morton , linux-api@vger.kernel.org, Axel Rasmussen , James Houghton , Hugh Dickins , Yang Shi , Miaohe Lin , David Hildenbrand , David Rientjes , Matthew Wilcox , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Chris Kennelly , "Kirill A. Shutemov" , Minchan Kim , Patrick Xia , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661551421; a=rsa-sha256; cv=none; b=AolsuZ/YFsqlYP+C65WIjzSj7GQ1YQXGafGRWPQSprdwCyRtm0+07znwJBk/5/uFMutWrI XZxesSGLXq4Kza4iOO4PUwiigcK7t6N4FurJUWQEF4jHGrFZk/Z3osAKix1d6xxoiMT/bN wr/Pk6xhpJ6jCFy3gYpPK0l9zJAHNMs= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=XvGjn7gH; spf=pass (imf11.hostedemail.com: domain of 3PEMJYwcKCAc6vrllmlnvvnsl.jvtspu14-ttr2hjr.vyn@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3PEMJYwcKCAc6vrllmlnvvnsl.jvtspu14-ttr2hjr.vyn@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661551421; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xLLp8vTNhH59q1o8fD2SQ2kq8Qu3/LkoSxB3VWIQLC0=; b=8eo15QC10ozk5Q1vDaXtYzF+31J9R2OAJWcEF/z9KLMOM1B7OszrHNzXJ59UrdY8tTof7N hL9Y3t4PW9Bk4N3gDeiLc2yqT3ANuUwBQKr6UNFvCovi2AnuXyi2sRrIbXFkxbXGbBKUuX 8PfxwAlXXgKj1AZiwMvjHxrOISsVDbY= X-Stat-Signature: w479g9h8c4gh93fdyfrmazbgoqamt8hy X-Rspamd-Queue-Id: B66934000C X-Rspam-User: X-Rspamd-Server: rspam06 Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=XvGjn7gH; spf=pass (imf11.hostedemail.com: domain of 3PEMJYwcKCAc6vrllmlnvvnsl.jvtspu14-ttr2hjr.vyn@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3PEMJYwcKCAc6vrllmlnvvnsl.jvtspu14-ttr2hjr.vyn@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1661551421-504267 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add huge_memory:trace_mm_khugepaged_scan_file tracepoint to hpage_collapse_scan_file() analogously to hpage_collapse_scan_pmd(). While this change is targeted at debugging MADV_COLLAPSE pathway, the "mm_khugepaged" prefix is retained for symmetry with huge_memory:trace_mm_khugepaged_scan_pmd, which retains it's legacy name to prevent changing kernel ABI as much as possible. Signed-off-by: Zach O'Keefe --- include/trace/events/huge_memory.h | 34 ++++++++++++++++++++++++++++++ mm/khugepaged.c | 3 ++- 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index a8db658e99e9..1d2dd88dc3c4 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -169,5 +169,39 @@ TRACE_EVENT(mm_collapse_huge_page_swapin, __entry->ret) ); +TRACE_EVENT(mm_khugepaged_scan_file, + + TP_PROTO(struct mm_struct *mm, struct page *page, const char *filename, + int present, int swap, int result), + + TP_ARGS(mm, page, filename, present, swap, result), + + TP_STRUCT__entry( + __field(struct mm_struct *, mm) + __field(unsigned long, pfn) + __string(filename, filename) + __field(int, present) + __field(int, swap) + __field(int, result) + ), + + TP_fast_assign( + __entry->mm = mm; + __entry->pfn = page ? page_to_pfn(page) : -1; + __assign_str(filename, filename); + __entry->present = present; + __entry->swap = swap; + __entry->result = result; + ), + + TP_printk("mm=%p, scan_pfn=0x%lx, filename=%s, present=%d, swap=%d, result=%s", + __entry->mm, + __entry->pfn, + __get_str(filename), + __entry->present, + __entry->swap, + __print_symbolic(__entry->result, SCAN_STATUS)) +); + #endif /* __HUGE_MEMORY_H */ #include diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 34c0c74b3839..55144f33ba09 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2171,7 +2171,8 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, } } - /* TODO: tracepoints */ + trace_mm_khugepaged_scan_file(mm, page, file->f_path.dentry->d_iname, + present, swap, result); return result; } #else From patchwork Fri Aug 26 22:03:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12956669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53A40C3DA6B for ; Fri, 26 Aug 2022 22:03:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E018C6B007D; Fri, 26 Aug 2022 18:03:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB2BB6B007E; Fri, 26 Aug 2022 18:03:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2C2E6B0080; Fri, 26 Aug 2022 18:03:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B546F6B007D for ; Fri, 26 Aug 2022 18:03:44 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8EA911608EC for ; Fri, 26 Aug 2022 22:03:44 +0000 (UTC) X-FDA: 79843121568.28.81F5CF8 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf23.hostedemail.com (Postfix) with ESMTP id 2DE9B14002B for ; Fri, 26 Aug 2022 22:03:44 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id z24-20020a17090ab11800b001fb2f53eeddso5506749pjq.3 for ; Fri, 26 Aug 2022 15:03:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc; bh=ypxTHNVpiViV9TzEO3UL176EnYI7CLR/iAxfbuDOups=; b=AUtfQRjWHoDpgzCdBqy7l2Y7Vj6Xkc0nvuswd2bV3mcvbcYHjtITTbg8HxWH0VaCdk d0gl47aDzjPpETHFYiIT8Kf6IP/wvMLtdn2HP+pEdmRrAtK89goWZFCBaTiabRH7jf39 zbYTlx3QxOOeIEAwYWCsVGA2a2MjJjziLHeVtzm4TYS0iD38xdikn+wUNVPhRyVxSZJg 4pYkQSgf6ZWm7RnjRNAszwV8boUN/SKvZ/Oa34nrlZpQ44VlQ6bQ4i4h+wi3L7ayhpyU ip11QJlbH5LUyKxYeN3FPncqL9JdpI7X48ad0UYFX4zmD7DEQOHNdaoMb8FBIFEG1Ulk e1Sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc; bh=ypxTHNVpiViV9TzEO3UL176EnYI7CLR/iAxfbuDOups=; b=lHk+aNhurwPi4mFJFAxBitQ7KujYd1zXgKS9fkBpNbnEBrkuWm4wE4KF/Dsy9Akvyi 3GzKY3Hw7QNzIxsBSj6nSJX3NGvxox2go7oyAii2MeSYKjV3uxqmYxYTDGDfxNTAUcNC cp2/s26tV+1s6eVEcrdtFl2v8tbNC0+4SbBQfiQ87unnPyRbO3zCkkYWmQ48FwGINJ5X 5zd/gTJStxaFyfvhxjRPy2mm8dlAGc9iyKNudEyZnE8Y/EadRO/ESyAUsmKhUynzU6ZN wM2qyQPFxnXZQMGzgKQBjFTtcSaojANfFUe4L/rsv0byTkQ72vxdM5LkxYYJrUnRrlT1 QKWg== X-Gm-Message-State: ACgBeo3fX4xgcnNVBLGFlMkj1K8k0V61bMQyZKPhPz2SqUWyzbJV18i7 3eEXJlEcBNBE74XC+EyTUrIMYaktmdZV392Sq7LGZNOTD4NGBpl4h3suV0CeKqefujL+lkQMROx 0MOG/g361QFaWDjFUdYysN31Z4Z5E9WoQRJZOwd8DLEuX5+CZW0AFpGTtXhA= X-Google-Smtp-Source: AA6agR6x8mdlHUPn/e4fV1q+4K1YyA9frHBiedudmcioyMRfDa5nr3+BNWQrsw3WOmYMGiUTdzG++7Hy/pV+ X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:249:b0:1e0:a8a3:3c6c with SMTP id t9-20020a17090a024900b001e0a8a33c6cmr349692pje.0.1661551422097; Fri, 26 Aug 2022 15:03:42 -0700 (PDT) Date: Fri, 26 Aug 2022 15:03:24 -0700 In-Reply-To: <20220826220329.1495407-1-zokeefe@google.com> Mime-Version: 1.0 References: <20220826220329.1495407-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220826220329.1495407-6-zokeefe@google.com> Subject: [PATCH mm-unstable v2 5/9] selftests/vm: dedup THP helpers From: "Zach O'Keefe" To: linux-mm@kvack.org Cc: Andrew Morton , linux-api@vger.kernel.org, Axel Rasmussen , James Houghton , Hugh Dickins , Yang Shi , Miaohe Lin , David Hildenbrand , David Rientjes , Matthew Wilcox , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Chris Kennelly , "Kirill A. Shutemov" , Minchan Kim , Patrick Xia , "Zach O'Keefe" ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=AUtfQRjW; spf=pass (imf23.hostedemail.com: domain of 3PkMJYwcKCAk8xtnnonpxxpun.lxvurw36-vvt4jlt.x0p@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3PkMJYwcKCAk8xtnnonpxxpun.lxvurw36-vvt4jlt.x0p@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661551424; a=rsa-sha256; cv=none; b=2rFzgr9kT4m1VzuaBBa7ZrrR9RzAdjWKFMWQ5uuM2FRiJFhxR39ORUvTySG/mMjhrjAb/D o4vOPdxzWlWxGfZCLydKX0m7U+rnsSHZPB/3H8RGM7cuLkQ2TV8KvFLdy127cDa4UFgzqa sayruB9QGGkaEXXINxpTRLoWN99FQD0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661551424; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ypxTHNVpiViV9TzEO3UL176EnYI7CLR/iAxfbuDOups=; b=Yi41fK+EJuKCXJUWMHMqWLMpWsnqv9JvNpHSENOwZ5yrKE5tNlaT/P2PcfkaiX+Y/dHMCr VmNueOqFXsXM0qkjk5YviVYZlytR1xwegPh5iNkizlirh2ELSlRRgT+n2qSJJskJZL5K+O Zr09zXV1s3ORLr6uiEe9UrB4QxHc828= X-Rspam-User: X-Stat-Signature: jezbhfj6r3z59d5f3ppr3isuwteuoz7b X-Rspamd-Queue-Id: 2DE9B14002B Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=AUtfQRjW; spf=pass (imf23.hostedemail.com: domain of 3PkMJYwcKCAk8xtnnonpxxpun.lxvurw36-vvt4jlt.x0p@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3PkMJYwcKCAk8xtnnonpxxpun.lxvurw36-vvt4jlt.x0p@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam09 X-HE-Tag: 1661551424-799484 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: These files: tools/testing/selftests/vm/vm_util.c tools/testing/selftests/vm/khugepaged.c Both contain logic to: 1) Determine hugepage size on current system 2) Read /proc/self/smaps to determine number of THPs at an address Refactor selftests/vm/khugepaged.c to use the vm_util common helpers and add it as a build dependency. Since selftests/vm/khugepaged.c is the largest user of check_huge(), change the signature of check_huge() to match selftests/vm/khugepaged.c's useage: take an expected number of hugepages, and return a bool indicating if the correct number of hugepages were found. Add a wrapper, check_huge_anon(), in anticipation of checking smaps for file and shmem hugepages. Update existing callsites to use the new pattern / function. Likewise, check_for_pattern() was duplicated, and it's a general enough helper to include in vm_util helpers as well. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/khugepaged.c | 64 ++----------------- tools/testing/selftests/vm/soft-dirty.c | 2 +- .../selftests/vm/split_huge_page_test.c | 12 ++-- tools/testing/selftests/vm/vm_util.c | 26 +++++--- tools/testing/selftests/vm/vm_util.h | 3 +- 6 files changed, 32 insertions(+), 76 deletions(-) diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index b52f2cc51482..df4fa77febca 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -97,6 +97,7 @@ TEST_FILES += va_128TBswitch.sh include ../lib.mk +$(OUTPUT)/khugepaged: vm_util.c $(OUTPUT)/madv_populate: vm_util.c $(OUTPUT)/soft-dirty: vm_util.c $(OUTPUT)/split_huge_page_test: vm_util.c diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index b77b1e28cdb3..e5c602f7a18b 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -11,6 +11,8 @@ #include #include +#include "vm_util.h" + #ifndef MADV_PAGEOUT #define MADV_PAGEOUT 21 #endif @@ -351,64 +353,12 @@ static void save_settings(void) signal(SIGQUIT, restore_settings); } -#define MAX_LINE_LENGTH 500 - -static bool check_for_pattern(FILE *fp, char *pattern, char *buf) -{ - while (fgets(buf, MAX_LINE_LENGTH, fp) != NULL) { - if (!strncmp(buf, pattern, strlen(pattern))) - return true; - } - return false; -} - static bool check_huge(void *addr, int nr_hpages) { - bool thp = false; - int ret; - FILE *fp; - char buffer[MAX_LINE_LENGTH]; - char addr_pattern[MAX_LINE_LENGTH]; - - ret = snprintf(addr_pattern, MAX_LINE_LENGTH, "%08lx-", - (unsigned long) addr); - if (ret >= MAX_LINE_LENGTH) { - printf("%s: Pattern is too long\n", __func__); - exit(EXIT_FAILURE); - } - - - fp = fopen(PID_SMAPS, "r"); - if (!fp) { - printf("%s: Failed to open file %s\n", __func__, PID_SMAPS); - exit(EXIT_FAILURE); - } - if (!check_for_pattern(fp, addr_pattern, buffer)) - goto err_out; - - ret = snprintf(addr_pattern, MAX_LINE_LENGTH, "AnonHugePages:%10ld kB", - nr_hpages * (hpage_pmd_size >> 10)); - if (ret >= MAX_LINE_LENGTH) { - printf("%s: Pattern is too long\n", __func__); - exit(EXIT_FAILURE); - } - /* - * Fetch the AnonHugePages: in the same block and check whether it got - * the expected number of hugeepages next. - */ - if (!check_for_pattern(fp, "AnonHugePages:", buffer)) - goto err_out; - - if (strncmp(buffer, addr_pattern, strlen(addr_pattern))) - goto err_out; - - thp = true; -err_out: - fclose(fp); - return thp; + return check_huge_anon(addr, nr_hpages, hpage_pmd_size); } - +#define MAX_LINE_LENGTH 500 static bool check_swap(void *addr, unsigned long size) { bool swap = false; @@ -430,7 +380,7 @@ static bool check_swap(void *addr, unsigned long size) printf("%s: Failed to open file %s\n", __func__, PID_SMAPS); exit(EXIT_FAILURE); } - if (!check_for_pattern(fp, addr_pattern, buffer)) + if (!check_for_pattern(fp, addr_pattern, buffer, sizeof(buffer))) goto err_out; ret = snprintf(addr_pattern, MAX_LINE_LENGTH, "Swap:%19ld kB", @@ -443,7 +393,7 @@ static bool check_swap(void *addr, unsigned long size) * Fetch the Swap: in the same block and check whether it got * the expected number of hugeepages next. */ - if (!check_for_pattern(fp, "Swap:", buffer)) + if (!check_for_pattern(fp, "Swap:", buffer, sizeof(buffer))) goto err_out; if (strncmp(buffer, addr_pattern, strlen(addr_pattern))) @@ -1045,7 +995,7 @@ int main(int argc, const char **argv) setbuf(stdout, NULL); page_size = getpagesize(); - hpage_pmd_size = read_num("hpage_pmd_size"); + hpage_pmd_size = read_pmd_pagesize(); hpage_pmd_nr = hpage_pmd_size / page_size; default_settings.khugepaged.max_ptes_none = hpage_pmd_nr - 1; diff --git a/tools/testing/selftests/vm/soft-dirty.c b/tools/testing/selftests/vm/soft-dirty.c index e3a43f5d4fa2..21d8830c5f24 100644 --- a/tools/testing/selftests/vm/soft-dirty.c +++ b/tools/testing/selftests/vm/soft-dirty.c @@ -91,7 +91,7 @@ static void test_hugepage(int pagemap_fd, int pagesize) for (i = 0; i < hpage_len; i++) map[i] = (char)i; - if (check_huge(map)) { + if (check_huge_anon(map, 1, hpage_len)) { ksft_test_result_pass("Test %s huge page allocation\n", __func__); clear_softdirty(); diff --git a/tools/testing/selftests/vm/split_huge_page_test.c b/tools/testing/selftests/vm/split_huge_page_test.c index 6aa2b8253aed..76e1c36dd9e5 100644 --- a/tools/testing/selftests/vm/split_huge_page_test.c +++ b/tools/testing/selftests/vm/split_huge_page_test.c @@ -92,7 +92,6 @@ void split_pmd_thp(void) { char *one_page; size_t len = 4 * pmd_pagesize; - uint64_t thp_size; size_t i; one_page = memalign(pmd_pagesize, len); @@ -107,8 +106,7 @@ void split_pmd_thp(void) for (i = 0; i < len; i++) one_page[i] = (char)i; - thp_size = check_huge(one_page); - if (!thp_size) { + if (!check_huge_anon(one_page, 1, pmd_pagesize)) { printf("No THP is allocated\n"); exit(EXIT_FAILURE); } @@ -124,9 +122,8 @@ void split_pmd_thp(void) } - thp_size = check_huge(one_page); - if (thp_size) { - printf("Still %ld kB AnonHugePages not split\n", thp_size); + if (check_huge_anon(one_page, 0, pmd_pagesize)) { + printf("Still AnonHugePages not split\n"); exit(EXIT_FAILURE); } @@ -172,8 +169,7 @@ void split_pte_mapped_thp(void) for (i = 0; i < len; i++) one_page[i] = (char)i; - thp_size = check_huge(one_page); - if (!thp_size) { + if (!check_huge_anon(one_page, 1, pmd_pagesize)) { printf("No THP is allocated\n"); exit(EXIT_FAILURE); } diff --git a/tools/testing/selftests/vm/vm_util.c b/tools/testing/selftests/vm/vm_util.c index b58ab11a7a30..9dae51b8219f 100644 --- a/tools/testing/selftests/vm/vm_util.c +++ b/tools/testing/selftests/vm/vm_util.c @@ -42,9 +42,9 @@ void clear_softdirty(void) ksft_exit_fail_msg("writing clear_refs failed\n"); } -static bool check_for_pattern(FILE *fp, const char *pattern, char *buf) +bool check_for_pattern(FILE *fp, const char *pattern, char *buf, size_t len) { - while (fgets(buf, MAX_LINE_LENGTH, fp) != NULL) { + while (fgets(buf, len, fp)) { if (!strncmp(buf, pattern, strlen(pattern))) return true; } @@ -72,9 +72,10 @@ uint64_t read_pmd_pagesize(void) return strtoul(buf, NULL, 10); } -uint64_t check_huge(void *addr) +bool __check_huge(void *addr, char *pattern, int nr_hpages, + uint64_t hpage_size) { - uint64_t thp = 0; + uint64_t thp = -1; int ret; FILE *fp; char buffer[MAX_LINE_LENGTH]; @@ -89,20 +90,27 @@ uint64_t check_huge(void *addr) if (!fp) ksft_exit_fail_msg("%s: Failed to open file %s\n", __func__, SMAP_FILE_PATH); - if (!check_for_pattern(fp, addr_pattern, buffer)) + if (!check_for_pattern(fp, addr_pattern, buffer, sizeof(buffer))) goto err_out; /* - * Fetch the AnonHugePages: in the same block and check the number of + * Fetch the pattern in the same block and check the number of * hugepages. */ - if (!check_for_pattern(fp, "AnonHugePages:", buffer)) + if (!check_for_pattern(fp, pattern, buffer, sizeof(buffer))) goto err_out; - if (sscanf(buffer, "AnonHugePages:%10ld kB", &thp) != 1) + snprintf(addr_pattern, MAX_LINE_LENGTH, "%s%%9ld kB", pattern); + + if (sscanf(buffer, addr_pattern, &thp) != 1) ksft_exit_fail_msg("Reading smap error\n"); err_out: fclose(fp); - return thp; + return thp == (nr_hpages * (hpage_size >> 10)); +} + +bool check_huge_anon(void *addr, int nr_hpages, uint64_t hpage_size) +{ + return __check_huge(addr, "AnonHugePages: ", nr_hpages, hpage_size); } diff --git a/tools/testing/selftests/vm/vm_util.h b/tools/testing/selftests/vm/vm_util.h index 2e512bd57ae1..8434ea0c95cd 100644 --- a/tools/testing/selftests/vm/vm_util.h +++ b/tools/testing/selftests/vm/vm_util.h @@ -5,5 +5,6 @@ uint64_t pagemap_get_entry(int fd, char *start); bool pagemap_is_softdirty(int fd, char *start); void clear_softdirty(void); +bool check_for_pattern(FILE *fp, const char *pattern, char *buf, size_t len); uint64_t read_pmd_pagesize(void); -uint64_t check_huge(void *addr); +bool check_huge_anon(void *addr, int nr_hpages, uint64_t hpage_size); From patchwork Fri Aug 26 22:03:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12956670 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E330C0502E for ; Fri, 26 Aug 2022 22:03:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06DF16B007E; Fri, 26 Aug 2022 18:03:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F383F6B0080; Fri, 26 Aug 2022 18:03:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D18AA940007; Fri, 26 Aug 2022 18:03:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BC6CB6B007E for ; Fri, 26 Aug 2022 18:03:46 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 90C27160E9C for ; Fri, 26 Aug 2022 22:03:46 +0000 (UTC) X-FDA: 79843121652.30.06F717B Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf23.hostedemail.com (Postfix) with ESMTP id 224E2140036 for ; Fri, 26 Aug 2022 22:03:44 +0000 (UTC) Received: by mail-pf1-f201.google.com with SMTP id k126-20020a628484000000b00536029e722fso1401686pfd.7 for ; Fri, 26 Aug 2022 15:03:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc; bh=chtPiSGrmShiwO4MaGT1hBI46EcS2Qn8TQ6APFUGpcA=; b=EBreyafz+Q/fD7hpBnB7I570AquP3VMyk/jcbVTA/+9cTPbLX7rkKq4C/b5CiQeyvy ui4Ttbss36p/FSo6osC+yxYh9ffjoO29i6DYghaYoePwuamCc4cr8JRTac2fpHUqKpS0 irfEFfvCWBm8SMJma74niKMRnumql4i7AVQhIpRvxs9j5EF4Z1tOC4CkOICdHCItMTlB uO5QFAMb1shh5Q/507xJ/1oDVgsiHAIkKXw/DKeZ9rBJy/Fz42u0fw9pi+w4ScBJuCU5 pEhnyIbV8lzke3myPnWMRUUj34Dw0hL4Zd8rIGst/vS0fc04TYrvNhEsfSgnTlRn8zzi 3uJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc; bh=chtPiSGrmShiwO4MaGT1hBI46EcS2Qn8TQ6APFUGpcA=; b=3Wux/z3qvrVBuat4q/dJKZzfO51CWJywR9GinBkQS3CVXEYObMwKIt56o0Qdv6E3nr yCxmipY2WHesyFsoC7WB4E6wPHfGAUkmVWTjvcgKGhP6SGi7m1YhZ+YSH/ZXd2PTIqxw Caa8YK7UDMeGWeF4bNRB6+fFqWgeOMTRcWv5MHusR6/VABE2bepDSz6xyf7l0emDW9wM YzdXaARZ67cuKUMCvj4pvEzhY5QhtDh90bdJiFQQallwQrU1wZXAFESy8InD5WZhWF/p Tg2vuaTQ2auRvqAegacWCr//tW+vorTQDpBGZkeXiPSY+IGfl9zTNACIXYChpxBMTp/X tuyQ== X-Gm-Message-State: ACgBeo3C0+rFfvQBj1jlBAt+k+Z1USFUhhZdbDMVOt8/2NQyTKbtm4rt GtBodNECTzaG0dmtenCcsVtGKFdP3Hgm5DfpjeQeuWbOEpE5w1Ak5g5fBriDJWZwcvQtrpoKk/+ 2H6daMDsJwoCJ7Q2nDUmfU1q6NZUcXP/aYry00lxTj2VDAOeQlxI+44MRYo8= X-Google-Smtp-Source: AA6agR50gQvdRr/IGjGIj1/6CBzrqLBr4ia+6gQ71urkRuMzrVB3dLj7j5a4P3L9wHPm8AM/6k/WD+7cHFZj X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:1d1a:b0:537:d750:65b5 with SMTP id a26-20020a056a001d1a00b00537d75065b5mr2940085pfx.85.1661551424065; Fri, 26 Aug 2022 15:03:44 -0700 (PDT) Date: Fri, 26 Aug 2022 15:03:25 -0700 In-Reply-To: <20220826220329.1495407-1-zokeefe@google.com> Mime-Version: 1.0 References: <20220826220329.1495407-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220826220329.1495407-7-zokeefe@google.com> Subject: [PATCH mm-unstable v2 6/9] selftests/vm: modularize thp collapse memory operations From: "Zach O'Keefe" To: linux-mm@kvack.org Cc: Andrew Morton , linux-api@vger.kernel.org, Axel Rasmussen , James Houghton , Hugh Dickins , Yang Shi , Miaohe Lin , David Hildenbrand , David Rientjes , Matthew Wilcox , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Chris Kennelly , "Kirill A. Shutemov" , Minchan Kim , Patrick Xia , "Zach O'Keefe" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661551425; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=chtPiSGrmShiwO4MaGT1hBI46EcS2Qn8TQ6APFUGpcA=; b=kfbiV8ynHbTxhH7rEvGby6BoQhp0yCp4wYP4xFjTQ6SWWnOgmr2RwgG+nLnoMOYZ6k75e6 iRS47ibBEwwGMgqorDl0RZnqRRDxBHWwQkhrCvOEueGF1Bg6KZ7h78aKH3lmU5BPtUVSH5 eQ+VKkweUSDMC6JosBrCnueMgVu8KxQ= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=EBreyafz; spf=pass (imf23.hostedemail.com: domain of 3QEMJYwcKCAsAzvppqprzzrwp.nzxwty58-xxv6lnv.z2r@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3QEMJYwcKCAsAzvppqprzzrwp.nzxwty58-xxv6lnv.z2r@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661551425; a=rsa-sha256; cv=none; b=3S4AYWwcoEf9x3UC382aie4xTWvo4JIUw7SuC6sMzCie50qzOV0PNsfPM2WmzICjb2vatQ 2C6EXjJHfEDXMI0GL/soOhonsK3MjIY5p73LNHqLF310vYg2RiDXWBIOZryXQEl6AddJSU Sc8SO7ujrryIwZZnRrnYvbNmuF7psi0= X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=EBreyafz; spf=pass (imf23.hostedemail.com: domain of 3QEMJYwcKCAsAzvppqprzzrwp.nzxwty58-xxv6lnv.z2r@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3QEMJYwcKCAsAzvppqprzzrwp.nzxwty58-xxv6lnv.z2r@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam05 X-Stat-Signature: oi3bwdqnywkxdfzhwu6id84jwz3fqgry X-Rspamd-Queue-Id: 224E2140036 X-HE-Tag: 1661551424-811990 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Modularize operations to setup, cleanup, fault, and check for huge pages, for a given memory type. This allows reusing existing tests with additional memory types by defining new memory operations. Following patches will add file and shmem memory types. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 366 +++++++++++++----------- 1 file changed, 200 insertions(+), 166 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index e5c602f7a18b..b4b1709507a5 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -28,8 +28,16 @@ static int hpage_pmd_nr; #define THP_SYSFS "/sys/kernel/mm/transparent_hugepage/" #define PID_SMAPS "/proc/self/smaps" +struct mem_ops { + void *(*setup_area)(int nr_hpages); + void (*cleanup_area)(void *p, unsigned long size); + void (*fault)(void *p, unsigned long start, unsigned long end); + bool (*check_huge)(void *addr, int nr_hpages); +}; + struct collapse_context { - void (*collapse)(const char *msg, char *p, int nr_hpages, bool expect); + void (*collapse)(const char *msg, char *p, int nr_hpages, + struct mem_ops *ops, bool expect); bool enforce_pte_scan_limits; }; @@ -353,11 +361,6 @@ static void save_settings(void) signal(SIGQUIT, restore_settings); } -static bool check_huge(void *addr, int nr_hpages) -{ - return check_huge_anon(addr, nr_hpages, hpage_pmd_size); -} - #define MAX_LINE_LENGTH 500 static bool check_swap(void *addr, unsigned long size) { @@ -431,18 +434,25 @@ static void fill_memory(int *p, unsigned long start, unsigned long end) * Returns pmd-mapped hugepage in VMA marked VM_HUGEPAGE, filled with * validate_memory()'able contents. */ -static void *alloc_hpage(void) +static void *alloc_hpage(struct mem_ops *ops) { - void *p; + void *p = ops->setup_area(1); - p = alloc_mapping(1); + ops->fault(p, 0, hpage_pmd_size); + if (madvise(p, hpage_pmd_size, MADV_HUGEPAGE)) { + perror("madvise(MADV_HUGEPAGE)"); + exit(EXIT_FAILURE); + } printf("Allocate huge page..."); - madvise(p, hpage_pmd_size, MADV_HUGEPAGE); - fill_memory(p, 0, hpage_pmd_size); - if (check_huge(p, 1)) - success("OK"); - else - fail("Fail"); + if (madvise(p, hpage_pmd_size, MADV_COLLAPSE)) { + perror("madvise(MADV_COLLAPSE)"); + exit(EXIT_FAILURE); + } + if (!ops->check_huge(p, 1)) { + perror("madvise(MADV_COLLAPSE)"); + exit(EXIT_FAILURE); + } + success("OK"); return p; } @@ -459,18 +469,40 @@ static void validate_memory(int *p, unsigned long start, unsigned long end) } } -static void madvise_collapse(const char *msg, char *p, int nr_hpages, - bool expect) +static void *anon_setup_area(int nr_hpages) +{ + return alloc_mapping(nr_hpages); +} + +static void anon_cleanup_area(void *p, unsigned long size) +{ + munmap(p, size); +} + +static void anon_fault(void *p, unsigned long start, unsigned long end) +{ + fill_memory(p, start, end); +} + +static bool anon_check_huge(void *addr, int nr_hpages) +{ + return check_huge_anon(addr, nr_hpages, hpage_pmd_size); +} + +static struct mem_ops anon_ops = { + .setup_area = &anon_setup_area, + .cleanup_area = &anon_cleanup_area, + .fault = &anon_fault, + .check_huge = &anon_check_huge, +}; + +static void __madvise_collapse(const char *msg, char *p, int nr_hpages, + struct mem_ops *ops, bool expect) { int ret; struct settings settings = *current_settings(); printf("%s...", msg); - /* Sanity check */ - if (!check_huge(p, 0)) { - printf("Unexpected huge page\n"); - exit(EXIT_FAILURE); - } /* * Prevent khugepaged interference and tests that MADV_COLLAPSE @@ -482,9 +514,10 @@ static void madvise_collapse(const char *msg, char *p, int nr_hpages, /* Clear VM_NOHUGEPAGE */ madvise(p, nr_hpages * hpage_pmd_size, MADV_HUGEPAGE); ret = madvise(p, nr_hpages * hpage_pmd_size, MADV_COLLAPSE); + if (((bool)ret) == expect) fail("Fail: Bad return value"); - else if (check_huge(p, nr_hpages) != expect) + else if (!ops->check_huge(p, expect ? nr_hpages : 0)) fail("Fail: check_huge()"); else success("OK"); @@ -492,14 +525,26 @@ static void madvise_collapse(const char *msg, char *p, int nr_hpages, pop_settings(); } +static void madvise_collapse(const char *msg, char *p, int nr_hpages, + struct mem_ops *ops, bool expect) +{ + /* Sanity check */ + if (!ops->check_huge(p, 0)) { + printf("Unexpected huge page\n"); + exit(EXIT_FAILURE); + } + __madvise_collapse(msg, p, nr_hpages, ops, expect); +} + #define TICK 500000 -static bool wait_for_scan(const char *msg, char *p, int nr_hpages) +static bool wait_for_scan(const char *msg, char *p, int nr_hpages, + struct mem_ops *ops) { int full_scans; int timeout = 6; /* 3 seconds */ /* Sanity check */ - if (!check_huge(p, 0)) { + if (!ops->check_huge(p, 0)) { printf("Unexpected huge page\n"); exit(EXIT_FAILURE); } @@ -511,7 +556,7 @@ static bool wait_for_scan(const char *msg, char *p, int nr_hpages) printf("%s...", msg); while (timeout--) { - if (check_huge(p, nr_hpages)) + if (ops->check_huge(p, nr_hpages)) break; if (read_num("khugepaged/full_scans") >= full_scans) break; @@ -525,19 +570,20 @@ static bool wait_for_scan(const char *msg, char *p, int nr_hpages) } static void khugepaged_collapse(const char *msg, char *p, int nr_hpages, - bool expect) + struct mem_ops *ops, bool expect) { - if (wait_for_scan(msg, p, nr_hpages)) { + if (wait_for_scan(msg, p, nr_hpages, ops)) { if (expect) fail("Timeout"); else success("OK"); return; - } else if (check_huge(p, nr_hpages) == expect) { + } + + if (ops->check_huge(p, expect ? nr_hpages : 0)) success("OK"); - } else { + else fail("Fail"); - } } static void alloc_at_fault(void) @@ -551,7 +597,7 @@ static void alloc_at_fault(void) p = alloc_mapping(1); *p = 1; printf("Allocate huge page on fault..."); - if (check_huge(p, 1)) + if (check_huge_anon(p, 1, hpage_pmd_size)) success("OK"); else fail("Fail"); @@ -560,49 +606,48 @@ static void alloc_at_fault(void) madvise(p, page_size, MADV_DONTNEED); printf("Split huge PMD on MADV_DONTNEED..."); - if (check_huge(p, 0)) + if (check_huge_anon(p, 0, hpage_pmd_size)) success("OK"); else fail("Fail"); munmap(p, hpage_pmd_size); } -static void collapse_full(struct collapse_context *c) +static void collapse_full(struct collapse_context *c, struct mem_ops *ops) { void *p; int nr_hpages = 4; unsigned long size = nr_hpages * hpage_pmd_size; - p = alloc_mapping(nr_hpages); - fill_memory(p, 0, size); + p = ops->setup_area(nr_hpages); + ops->fault(p, 0, size); c->collapse("Collapse multiple fully populated PTE table", p, nr_hpages, - true); + ops, true); validate_memory(p, 0, size); - munmap(p, size); + ops->cleanup_area(p, size); } -static void collapse_empty(struct collapse_context *c) +static void collapse_empty(struct collapse_context *c, struct mem_ops *ops) { void *p; - p = alloc_mapping(1); - c->collapse("Do not collapse empty PTE table", p, 1, false); - munmap(p, hpage_pmd_size); + p = ops->setup_area(1); + c->collapse("Do not collapse empty PTE table", p, 1, ops, false); + ops->cleanup_area(p, hpage_pmd_size); } -static void collapse_single_pte_entry(struct collapse_context *c) +static void collapse_single_pte_entry(struct collapse_context *c, struct mem_ops *ops) { void *p; - p = alloc_mapping(1); - fill_memory(p, 0, page_size); + p = ops->setup_area(1); + ops->fault(p, 0, page_size); c->collapse("Collapse PTE table with single PTE entry present", p, - 1, true); - validate_memory(p, 0, page_size); - munmap(p, hpage_pmd_size); + 1, ops, true); + ops->cleanup_area(p, hpage_pmd_size); } -static void collapse_max_ptes_none(struct collapse_context *c) +static void collapse_max_ptes_none(struct collapse_context *c, struct mem_ops *ops) { int max_ptes_none = hpage_pmd_nr / 2; struct settings settings = *current_settings(); @@ -611,30 +656,30 @@ static void collapse_max_ptes_none(struct collapse_context *c) settings.khugepaged.max_ptes_none = max_ptes_none; push_settings(&settings); - p = alloc_mapping(1); + p = ops->setup_area(1); - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); + ops->fault(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); c->collapse("Maybe collapse with max_ptes_none exceeded", p, 1, - !c->enforce_pte_scan_limits); + ops, !c->enforce_pte_scan_limits); validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); if (c->enforce_pte_scan_limits) { - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); - c->collapse("Collapse with max_ptes_none PTEs empty", p, 1, + ops->fault(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + c->collapse("Collapse with max_ptes_none PTEs empty", p, 1, ops, true); validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); } - - munmap(p, hpage_pmd_size); + ops->cleanup_area(p, hpage_pmd_size); pop_settings(); } -static void collapse_swapin_single_pte(struct collapse_context *c) +static void collapse_swapin_single_pte(struct collapse_context *c, struct mem_ops *ops) { void *p; - p = alloc_mapping(1); - fill_memory(p, 0, hpage_pmd_size); + + p = ops->setup_area(1); + ops->fault(p, 0, hpage_pmd_size); printf("Swapout one page..."); if (madvise(p, page_size, MADV_PAGEOUT)) { @@ -648,20 +693,21 @@ static void collapse_swapin_single_pte(struct collapse_context *c) goto out; } - c->collapse("Collapse with swapping in single PTE entry", p, 1, true); + c->collapse("Collapse with swapping in single PTE entry", p, 1, ops, + true); validate_memory(p, 0, hpage_pmd_size); out: - munmap(p, hpage_pmd_size); + ops->cleanup_area(p, hpage_pmd_size); } -static void collapse_max_ptes_swap(struct collapse_context *c) +static void collapse_max_ptes_swap(struct collapse_context *c, struct mem_ops *ops) { int max_ptes_swap = read_num("khugepaged/max_ptes_swap"); void *p; - p = alloc_mapping(1); + p = ops->setup_area(1); + ops->fault(p, 0, hpage_pmd_size); - fill_memory(p, 0, hpage_pmd_size); printf("Swapout %d of %d pages...", max_ptes_swap + 1, hpage_pmd_nr); if (madvise(p, (max_ptes_swap + 1) * page_size, MADV_PAGEOUT)) { perror("madvise(MADV_PAGEOUT)"); @@ -674,12 +720,12 @@ static void collapse_max_ptes_swap(struct collapse_context *c) goto out; } - c->collapse("Maybe collapse with max_ptes_swap exceeded", p, 1, + c->collapse("Maybe collapse with max_ptes_swap exceeded", p, 1, ops, !c->enforce_pte_scan_limits); validate_memory(p, 0, hpage_pmd_size); if (c->enforce_pte_scan_limits) { - fill_memory(p, 0, hpage_pmd_size); + ops->fault(p, 0, hpage_pmd_size); printf("Swapout %d of %d pages...", max_ptes_swap, hpage_pmd_nr); if (madvise(p, max_ptes_swap * page_size, MADV_PAGEOUT)) { @@ -694,63 +740,65 @@ static void collapse_max_ptes_swap(struct collapse_context *c) } c->collapse("Collapse with max_ptes_swap pages swapped out", p, - 1, true); + 1, ops, true); validate_memory(p, 0, hpage_pmd_size); } out: - munmap(p, hpage_pmd_size); + ops->cleanup_area(p, hpage_pmd_size); } -static void collapse_single_pte_entry_compound(struct collapse_context *c) +static void collapse_single_pte_entry_compound(struct collapse_context *c, struct mem_ops *ops) { void *p; - p = alloc_hpage(); + p = alloc_hpage(ops); + madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); printf("Split huge page leaving single PTE mapping compound page..."); madvise(p + page_size, hpage_pmd_size - page_size, MADV_DONTNEED); - if (check_huge(p, 0)) + if (ops->check_huge(p, 0)) success("OK"); else fail("Fail"); c->collapse("Collapse PTE table with single PTE mapping compound page", - p, 1, true); + p, 1, ops, true); validate_memory(p, 0, page_size); - munmap(p, hpage_pmd_size); + ops->cleanup_area(p, hpage_pmd_size); } -static void collapse_full_of_compound(struct collapse_context *c) +static void collapse_full_of_compound(struct collapse_context *c, struct mem_ops *ops) { void *p; - p = alloc_hpage(); + p = alloc_hpage(ops); printf("Split huge page leaving single PTE page table full of compound pages..."); madvise(p, page_size, MADV_NOHUGEPAGE); madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); - if (check_huge(p, 0)) + if (ops->check_huge(p, 0)) success("OK"); else fail("Fail"); - c->collapse("Collapse PTE table full of compound pages", p, 1, true); + c->collapse("Collapse PTE table full of compound pages", p, 1, ops, + true); validate_memory(p, 0, hpage_pmd_size); - munmap(p, hpage_pmd_size); + ops->cleanup_area(p, hpage_pmd_size); } -static void collapse_compound_extreme(struct collapse_context *c) +static void collapse_compound_extreme(struct collapse_context *c, struct mem_ops *ops) { void *p; int i; - p = alloc_mapping(1); + p = ops->setup_area(1); for (i = 0; i < hpage_pmd_nr; i++) { printf("\rConstruct PTE page table full of different PTE-mapped compound pages %3d/%d...", i + 1, hpage_pmd_nr); madvise(BASE_ADDR, hpage_pmd_size, MADV_HUGEPAGE); - fill_memory(BASE_ADDR, 0, hpage_pmd_size); - if (!check_huge(BASE_ADDR, 1)) { + ops->fault(BASE_ADDR, 0, hpage_pmd_size); + if (!ops->check_huge(BASE_ADDR, 1)) { printf("Failed to allocate huge page\n"); exit(EXIT_FAILURE); } @@ -777,30 +825,30 @@ static void collapse_compound_extreme(struct collapse_context *c) } } - munmap(BASE_ADDR, hpage_pmd_size); - fill_memory(p, 0, hpage_pmd_size); - if (check_huge(p, 0)) + ops->cleanup_area(BASE_ADDR, hpage_pmd_size); + ops->fault(p, 0, hpage_pmd_size); + if (!ops->check_huge(p, 1)) success("OK"); else fail("Fail"); c->collapse("Collapse PTE table full of different compound pages", p, 1, - true); + ops, true); validate_memory(p, 0, hpage_pmd_size); - munmap(p, hpage_pmd_size); + ops->cleanup_area(p, hpage_pmd_size); } -static void collapse_fork(struct collapse_context *c) +static void collapse_fork(struct collapse_context *c, struct mem_ops *ops) { int wstatus; void *p; - p = alloc_mapping(1); + p = ops->setup_area(1); printf("Allocate small page..."); - fill_memory(p, 0, page_size); - if (check_huge(p, 0)) + ops->fault(p, 0, page_size); + if (ops->check_huge(p, 0)) success("OK"); else fail("Fail"); @@ -811,17 +859,17 @@ static void collapse_fork(struct collapse_context *c) skip_settings_restore = true; exit_status = 0; - if (check_huge(p, 0)) + if (ops->check_huge(p, 0)) success("OK"); else fail("Fail"); - fill_memory(p, page_size, 2 * page_size); + ops->fault(p, page_size, 2 * page_size); c->collapse("Collapse PTE table with single page shared with parent process", - p, 1, true); + p, 1, ops, true); validate_memory(p, 0, page_size); - munmap(p, hpage_pmd_size); + ops->cleanup_area(p, hpage_pmd_size); exit(exit_status); } @@ -829,27 +877,27 @@ static void collapse_fork(struct collapse_context *c) exit_status += WEXITSTATUS(wstatus); printf("Check if parent still has small page..."); - if (check_huge(p, 0)) + if (ops->check_huge(p, 0)) success("OK"); else fail("Fail"); validate_memory(p, 0, page_size); - munmap(p, hpage_pmd_size); + ops->cleanup_area(p, hpage_pmd_size); } -static void collapse_fork_compound(struct collapse_context *c) +static void collapse_fork_compound(struct collapse_context *c, struct mem_ops *ops) { int wstatus; void *p; - p = alloc_hpage(); + p = alloc_hpage(ops); printf("Share huge page over fork()..."); if (!fork()) { /* Do not touch settings on child exit */ skip_settings_restore = true; exit_status = 0; - if (check_huge(p, 1)) + if (ops->check_huge(p, 1)) success("OK"); else fail("Fail"); @@ -857,20 +905,20 @@ static void collapse_fork_compound(struct collapse_context *c) printf("Split huge page PMD in child process..."); madvise(p, page_size, MADV_NOHUGEPAGE); madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); - if (check_huge(p, 0)) + if (ops->check_huge(p, 0)) success("OK"); else fail("Fail"); - fill_memory(p, 0, page_size); + ops->fault(p, 0, page_size); write_num("khugepaged/max_ptes_shared", hpage_pmd_nr - 1); c->collapse("Collapse PTE table full of compound pages in child", - p, 1, true); + p, 1, ops, true); write_num("khugepaged/max_ptes_shared", current_settings()->khugepaged.max_ptes_shared); validate_memory(p, 0, hpage_pmd_size); - munmap(p, hpage_pmd_size); + ops->cleanup_area(p, hpage_pmd_size); exit(exit_status); } @@ -878,59 +926,59 @@ static void collapse_fork_compound(struct collapse_context *c) exit_status += WEXITSTATUS(wstatus); printf("Check if parent still has huge page..."); - if (check_huge(p, 1)) + if (ops->check_huge(p, 1)) success("OK"); else fail("Fail"); validate_memory(p, 0, hpage_pmd_size); - munmap(p, hpage_pmd_size); + ops->cleanup_area(p, hpage_pmd_size); } -static void collapse_max_ptes_shared(struct collapse_context *c) +static void collapse_max_ptes_shared(struct collapse_context *c, struct mem_ops *ops) { int max_ptes_shared = read_num("khugepaged/max_ptes_shared"); int wstatus; void *p; - p = alloc_hpage(); + p = alloc_hpage(ops); printf("Share huge page over fork()..."); if (!fork()) { /* Do not touch settings on child exit */ skip_settings_restore = true; exit_status = 0; - if (check_huge(p, 1)) + if (ops->check_huge(p, 1)) success("OK"); else fail("Fail"); printf("Trigger CoW on page %d of %d...", hpage_pmd_nr - max_ptes_shared - 1, hpage_pmd_nr); - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared - 1) * page_size); - if (check_huge(p, 0)) + ops->fault(p, 0, (hpage_pmd_nr - max_ptes_shared - 1) * page_size); + if (ops->check_huge(p, 0)) success("OK"); else fail("Fail"); c->collapse("Maybe collapse with max_ptes_shared exceeded", p, - 1, !c->enforce_pte_scan_limits); + 1, ops, !c->enforce_pte_scan_limits); if (c->enforce_pte_scan_limits) { printf("Trigger CoW on page %d of %d...", hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * + ops->fault(p, 0, (hpage_pmd_nr - max_ptes_shared) * page_size); - if (check_huge(p, 0)) + if (ops->check_huge(p, 0)) success("OK"); else fail("Fail"); c->collapse("Collapse with max_ptes_shared PTEs shared", - p, 1, true); + p, 1, ops, true); } validate_memory(p, 0, hpage_pmd_size); - munmap(p, hpage_pmd_size); + ops->cleanup_area(p, hpage_pmd_size); exit(exit_status); } @@ -938,42 +986,28 @@ static void collapse_max_ptes_shared(struct collapse_context *c) exit_status += WEXITSTATUS(wstatus); printf("Check if parent still has huge page..."); - if (check_huge(p, 1)) + if (ops->check_huge(p, 1)) success("OK"); else fail("Fail"); validate_memory(p, 0, hpage_pmd_size); - munmap(p, hpage_pmd_size); + ops->cleanup_area(p, hpage_pmd_size); } -static void madvise_collapse_existing_thps(void) +static void madvise_collapse_existing_thps(struct collapse_context *c, + struct mem_ops *ops) { void *p; - int err; - p = alloc_mapping(1); - fill_memory(p, 0, hpage_pmd_size); + p = ops->setup_area(1); + ops->fault(p, 0, hpage_pmd_size); + c->collapse("Collapse fully populated PTE table...", p, 1, ops, true); + validate_memory(p, 0, hpage_pmd_size); - printf("Collapse fully populated PTE table..."); - /* - * Note that we don't set MADV_HUGEPAGE here, which - * also tests that VM_HUGEPAGE isn't required for - * MADV_COLLAPSE in "madvise" mode. - */ - err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); - if (err == 0 && check_huge(p, 1)) { - success("OK"); - printf("Re-collapse PMD-mapped hugepage"); - err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); - if (err == 0 && check_huge(p, 1)) - success("OK"); - else - fail("Fail"); - } else { - fail("Fail"); - } + /* c->collapse() will find a hugepage and complain - call directly. */ + __madvise_collapse("Re-collapse PMD-mapped hugepage", p, 1, ops, true); validate_memory(p, 0, hpage_pmd_size); - munmap(p, hpage_pmd_size); + ops->cleanup_area(p, hpage_pmd_size); } int main(int argc, const char **argv) @@ -1013,37 +1047,37 @@ int main(int argc, const char **argv) c.collapse = &khugepaged_collapse; c.enforce_pte_scan_limits = true; - collapse_full(&c); - collapse_empty(&c); - collapse_single_pte_entry(&c); - collapse_max_ptes_none(&c); - collapse_swapin_single_pte(&c); - collapse_max_ptes_swap(&c); - collapse_single_pte_entry_compound(&c); - collapse_full_of_compound(&c); - collapse_compound_extreme(&c); - collapse_fork(&c); - collapse_fork_compound(&c); - collapse_max_ptes_shared(&c); + collapse_full(&c, &anon_ops); + collapse_empty(&c, &anon_ops); + collapse_single_pte_entry(&c, &anon_ops); + collapse_max_ptes_none(&c, &anon_ops); + collapse_swapin_single_pte(&c, &anon_ops); + collapse_max_ptes_swap(&c, &anon_ops); + collapse_single_pte_entry_compound(&c, &anon_ops); + collapse_full_of_compound(&c, &anon_ops); + collapse_compound_extreme(&c, &anon_ops); + collapse_fork(&c, &anon_ops); + collapse_fork_compound(&c, &anon_ops); + collapse_max_ptes_shared(&c, &anon_ops); } if (!strcmp(tests, "madvise") || !strcmp(tests, "all")) { printf("\n*** Testing context: madvise ***\n"); c.collapse = &madvise_collapse; c.enforce_pte_scan_limits = false; - collapse_full(&c); - collapse_empty(&c); - collapse_single_pte_entry(&c); - collapse_max_ptes_none(&c); - collapse_swapin_single_pte(&c); - collapse_max_ptes_swap(&c); - collapse_single_pte_entry_compound(&c); - collapse_full_of_compound(&c); - collapse_compound_extreme(&c); - collapse_fork(&c); - collapse_fork_compound(&c); - collapse_max_ptes_shared(&c); - madvise_collapse_existing_thps(); + collapse_full(&c, &anon_ops); + collapse_empty(&c, &anon_ops); + collapse_single_pte_entry(&c, &anon_ops); + collapse_max_ptes_none(&c, &anon_ops); + collapse_swapin_single_pte(&c, &anon_ops); + collapse_max_ptes_swap(&c, &anon_ops); + collapse_single_pte_entry_compound(&c, &anon_ops); + collapse_full_of_compound(&c, &anon_ops); + collapse_compound_extreme(&c, &anon_ops); + collapse_fork(&c, &anon_ops); + collapse_fork_compound(&c, &anon_ops); + collapse_max_ptes_shared(&c, &anon_ops); + madvise_collapse_existing_thps(&c, &anon_ops); } restore_settings(0); From patchwork Fri Aug 26 22:03:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12956671 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D09ECC0502F for ; Fri, 26 Aug 2022 22:03:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D17C76B0080; Fri, 26 Aug 2022 18:03:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC8696B0081; Fri, 26 Aug 2022 18:03:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B407B940007; Fri, 26 Aug 2022 18:03:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9DB0C6B0080 for ; Fri, 26 Aug 2022 18:03:47 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 742AF807E5 for ; Fri, 26 Aug 2022 22:03:47 +0000 (UTC) X-FDA: 79843121694.26.8C89E7B Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf25.hostedemail.com (Postfix) with ESMTP id 18D73A002E for ; Fri, 26 Aug 2022 22:03:46 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id g9-20020a17090a290900b001fd59cc2c14so1397407pjd.7 for ; Fri, 26 Aug 2022 15:03:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc; bh=SA4EsaTZqY0IlFita6xWsKi+cqf5BrYgP41FGTx/jhA=; b=Mk+xdkIgST81hI5zDyI8cnHB311zEaBY29I8ts9wT8SpLM+eRHYQTJHZnUYsJpM7TV qsUQOykSW+PZrqTT6l5x4D0VIC3Vlksfe1gmBnLNWLUv/vTSMwjgntG/aRYLXpQpp82W HA73jsA+fA+Xv9wpfesSmJsrIOv+sAW7w9FNSA0yYnKy9jR7i58by6Wcm9S/LWvhribQ rbgW62xE7Cn0qNfQ84WP5oW2NfgadvQ1yRf3X/mOtbktAa7wQnn7MTnVRDLTOwBncXHD aZZJIcrg2CyxmWagPzfZeBDN0+HI2AEWa6MmaMi4nl05re3c5uZ6qv+pMCJTBHH3y3EN 1cyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc; bh=SA4EsaTZqY0IlFita6xWsKi+cqf5BrYgP41FGTx/jhA=; b=olSsFLf+yEtKvirkxL/yb4+Zp1bmbo+b2WUS/RG0NAUWyhpMeVrWpEua899WItXeej FFRFV/y5+EMz1X1aEgAVcaj8MSC8nLNSvKN8pD3ewOR0b2/DHl+1xIgwzaHUasvB8cao tvbMD4UsaLgiuTTkFRRh+t70iGjcF79vOsUVbBhE7PRsnv2bxdwe7EwTy7+gVk4mF2YB B7ouDgshHhtu3yLzcvzfdsNEup+H6rZXRtgHdwO6o3Cd7CB0FJJ1fxreL0zXo0lNiB5c rh+JubWkeGOaft1p5JnfOzeOebxUYS9WUq1a6hRlsasQAaJ02Zm1mgPx7G8XdlDLi1y/ +wqA== X-Gm-Message-State: ACgBeo17Atc2fq+GuvFe1lYFgswKuAmZ1WIwdhjc6oExQJZI/EQG5bOz SSD42iPPEHw0kI/+Fn2HvQo8B6WB522WzWywm6SvJe3NMyKTA9alxH2/o5mcOtl7uPTg59oUlsH Ve+ZIm/xP3Bt5egdoZWfT5zNMHPcVkxdYSPYzdmJzKej3/labopwOrJR2Pm4= X-Google-Smtp-Source: AA6agR6LBD1baw4tUiBtVLkpQ6n0ht28zCCttHPtegt5ptFISpaxs55mCE195gWLKyG6QW0EFKjI5hWxwODS X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:da84:b0:173:195:5401 with SMTP id j4-20020a170902da8400b0017301955401mr5499128plx.28.1661551425995; Fri, 26 Aug 2022 15:03:45 -0700 (PDT) Date: Fri, 26 Aug 2022 15:03:26 -0700 In-Reply-To: <20220826220329.1495407-1-zokeefe@google.com> Mime-Version: 1.0 References: <20220826220329.1495407-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220826220329.1495407-8-zokeefe@google.com> Subject: [PATCH mm-unstable v2 7/9] selftests/vm: add thp collapse file and tmpfs testing From: "Zach O'Keefe" To: linux-mm@kvack.org Cc: Andrew Morton , linux-api@vger.kernel.org, Axel Rasmussen , James Houghton , Hugh Dickins , Yang Shi , Miaohe Lin , David Hildenbrand , David Rientjes , Matthew Wilcox , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Chris Kennelly , "Kirill A. Shutemov" , Minchan Kim , Patrick Xia , "Zach O'Keefe" ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Mk+xdkIg; spf=pass (imf25.hostedemail.com: domain of 3QUMJYwcKCAwB0wqqrqs00sxq.o0yxuz69-yyw7mow.03s@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3QUMJYwcKCAwB0wqqrqs00sxq.o0yxuz69-yyw7mow.03s@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661551427; a=rsa-sha256; cv=none; b=atZPFpx0VBsYXA2JqnpXa17bR9zhR2YcAVbUSZZ87BqNo9HCWlEFfkEmQPb3Tww3nu7bbd aE4Mv6hlz47aCbab6rdmiIcrTc34lYEUUoxTSQLvfZ9TZJzlQu/tVLv3yWvvU4hXrEV5s1 8iTdnZOY3/QWLTiRdjuTGZY7oItVPF8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661551427; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SA4EsaTZqY0IlFita6xWsKi+cqf5BrYgP41FGTx/jhA=; b=yD0zPzfW0wWHZ7fsX3gf33vL3fjENZREMI9qgMruhjK6J2MVzPxqfRSvBgJOYE2dThYi8H ZaaFEK1Y8kbnL3GpKkbxrZtLosDIgr6T+YYrH3hQTCYZL7BEJYyk30vOgvY32X8w25ao8u h0Iaq2mO+gpIKPLAYyW/vusg6ZnFJ9s= X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Mk+xdkIg; spf=pass (imf25.hostedemail.com: domain of 3QUMJYwcKCAwB0wqqrqs00sxq.o0yxuz69-yyw7mow.03s@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3QUMJYwcKCAwB0wqqrqs00sxq.o0yxuz69-yyw7mow.03s@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam07 X-Stat-Signature: g666khrh1ta5murhpq3keatoesqzwz8a X-Rspamd-Queue-Id: 18D73A002E X-HE-Tag: 1661551426-337182 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add memory operations for file-backed and tmpfs memory. Call existing tests with these new memory operations to test collapse functionality of khugepaged and MADV_COLLAPSE on file-backed and tmpfs memory. Not all tests are reusable; for example, collapse_swapin_single_pte() which checks swap usage. Refactor test arguments. Usage is now: Usage: ./khugepaged [dir] : : : [all|khugepaged|madvise] : [all|anon|file] "file,all" mem_type requires [dir] argument "file,all" mem_type requires kernel built with CONFIG_READ_ONLY_THP_FOR_FS=y if [dir] is a (sub)directory of a tmpfs mount, tmpfs must be mounted with huge=madvise option for khugepaged tests to work Refactor calling tests to make it clear what collapse context / memory operations they support, but only invoke tests requested by user. Also log what test is being ran, and with what context / memory, to make test logs more human readable. A new test file is created and deleted for every test to ensure no pages remain in the page cache between tests. Also, disable read_ahead_kb so that pages don't find their way into the page cache without the tests faulting them in. Add file and shmem wrappers to vm_utils check for file and shmem hugepages in smaps. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 401 ++++++++++++++++++++---- tools/testing/selftests/vm/vm_util.c | 10 + tools/testing/selftests/vm/vm_util.h | 2 + 3 files changed, 357 insertions(+), 56 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index b4b1709507a5..0ddfffb87411 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -1,6 +1,7 @@ #define _GNU_SOURCE #include #include +#include #include #include #include @@ -10,12 +11,20 @@ #include #include +#include +#include +#include + +#include "linux/magic.h" #include "vm_util.h" #ifndef MADV_PAGEOUT #define MADV_PAGEOUT 21 #endif +#ifndef MADV_POPULATE_READ +#define MADV_POPULATE_READ 22 +#endif #ifndef MADV_COLLAPSE #define MADV_COLLAPSE 25 #endif @@ -26,21 +35,48 @@ static unsigned long page_size; static int hpage_pmd_nr; #define THP_SYSFS "/sys/kernel/mm/transparent_hugepage/" +#define READ_AHEAD_SYSFS "/sys/block/sda/queue/read_ahead_kb" #define PID_SMAPS "/proc/self/smaps" +#define TEST_FILE "collapse_test_file" + +#define MAX_LINE_LENGTH 500 + +enum vma_type { + VMA_ANON, + VMA_FILE, + VMA_SHMEM, +}; struct mem_ops { void *(*setup_area)(int nr_hpages); void (*cleanup_area)(void *p, unsigned long size); void (*fault)(void *p, unsigned long start, unsigned long end); bool (*check_huge)(void *addr, int nr_hpages); + const char *name; }; +static struct mem_ops *file_ops; +static struct mem_ops *anon_ops; + struct collapse_context { void (*collapse)(const char *msg, char *p, int nr_hpages, struct mem_ops *ops, bool expect); bool enforce_pte_scan_limits; + const char *name; +}; + +static struct collapse_context *khugepaged_context; +static struct collapse_context *madvise_context; + +struct file_info { + const char *dir; + char path[MAX_LINE_LENGTH]; + enum vma_type type; + int fd; }; +static struct file_info finfo; + enum thp_enabled { THP_ALWAYS, THP_MADVISE, @@ -106,6 +142,7 @@ struct settings { enum shmem_enabled shmem_enabled; bool use_zero_page; struct khugepaged_settings khugepaged; + unsigned long read_ahead_kb; }; static struct settings saved_settings; @@ -124,6 +161,11 @@ static void fail(const char *msg) exit_status++; } +static void skip(const char *msg) +{ + printf(" \e[33m%s\e[0m\n", msg); +} + static int read_file(const char *path, char *buf, size_t buflen) { int fd; @@ -151,13 +193,19 @@ static int write_file(const char *path, const char *buf, size_t buflen) ssize_t numwritten; fd = open(path, O_WRONLY); - if (fd == -1) + if (fd == -1) { + printf("open(%s)\n", path); + exit(EXIT_FAILURE); return 0; + } numwritten = write(fd, buf, buflen - 1); close(fd); - if (numwritten < 1) + if (numwritten < 1) { + printf("write(%s)\n", buf); + exit(EXIT_FAILURE); return 0; + } return (unsigned int) numwritten; } @@ -224,20 +272,11 @@ static void write_string(const char *name, const char *val) } } -static const unsigned long read_num(const char *name) +static const unsigned long _read_num(const char *path) { - char path[PATH_MAX]; char buf[21]; - int ret; - - ret = snprintf(path, PATH_MAX, THP_SYSFS "%s", name); - if (ret >= PATH_MAX) { - printf("%s: Pathname is too long\n", __func__); - exit(EXIT_FAILURE); - } - ret = read_file(path, buf, sizeof(buf)); - if (ret < 0) { + if (read_file(path, buf, sizeof(buf)) < 0) { perror("read_file(read_num)"); exit(EXIT_FAILURE); } @@ -245,10 +284,9 @@ static const unsigned long read_num(const char *name) return strtoul(buf, NULL, 10); } -static void write_num(const char *name, unsigned long num) +static const unsigned long read_num(const char *name) { char path[PATH_MAX]; - char buf[21]; int ret; ret = snprintf(path, PATH_MAX, THP_SYSFS "%s", name); @@ -256,6 +294,12 @@ static void write_num(const char *name, unsigned long num) printf("%s: Pathname is too long\n", __func__); exit(EXIT_FAILURE); } + return _read_num(path); +} + +static void _write_num(const char *path, unsigned long num) +{ + char buf[21]; sprintf(buf, "%ld", num); if (!write_file(path, buf, strlen(buf) + 1)) { @@ -264,6 +308,19 @@ static void write_num(const char *name, unsigned long num) } } +static void write_num(const char *name, unsigned long num) +{ + char path[PATH_MAX]; + int ret; + + ret = snprintf(path, PATH_MAX, THP_SYSFS "%s", name); + if (ret >= PATH_MAX) { + printf("%s: Pathname is too long\n", __func__); + exit(EXIT_FAILURE); + } + _write_num(path, num); +} + static void write_settings(struct settings *settings) { struct khugepaged_settings *khugepaged = &settings->khugepaged; @@ -283,6 +340,8 @@ static void write_settings(struct settings *settings) write_num("khugepaged/max_ptes_swap", khugepaged->max_ptes_swap); write_num("khugepaged/max_ptes_shared", khugepaged->max_ptes_shared); write_num("khugepaged/pages_to_scan", khugepaged->pages_to_scan); + + _write_num(READ_AHEAD_SYSFS, settings->read_ahead_kb); } #define MAX_SETTINGS_DEPTH 4 @@ -341,6 +400,7 @@ static void save_settings(void) .shmem_enabled = read_string("shmem_enabled", shmem_enabled_strings), .use_zero_page = read_num("use_zero_page"), + .read_ahead_kb = _read_num(READ_AHEAD_SYSFS), }; saved_settings.khugepaged = (struct khugepaged_settings) { .defrag = read_num("khugepaged/defrag"), @@ -361,7 +421,6 @@ static void save_settings(void) signal(SIGQUIT, restore_settings); } -#define MAX_LINE_LENGTH 500 static bool check_swap(void *addr, unsigned long size) { bool swap = false; @@ -489,11 +548,109 @@ static bool anon_check_huge(void *addr, int nr_hpages) return check_huge_anon(addr, nr_hpages, hpage_pmd_size); } -static struct mem_ops anon_ops = { +static void *file_setup_area(int nr_hpages) +{ + struct stat path_stat; + struct statfs fs; + int fd; + void *p; + unsigned long size; + + stat(finfo.dir, &path_stat); + if (!S_ISDIR(path_stat.st_mode)) { + printf("%s: Not a directory (%s)\n", __func__, finfo.dir); + exit(EXIT_FAILURE); + } + if (snprintf(finfo.path, sizeof(finfo.path), "%s/" TEST_FILE, + finfo.dir) >= sizeof(finfo.path)) { + printf("%s: Pathname is too long\n", __func__); + exit(EXIT_FAILURE); + } + if (statfs(finfo.dir, &fs)) { + perror("statfs()"); + exit(EXIT_FAILURE); + } + finfo.type = fs.f_type == TMPFS_MAGIC ? VMA_SHMEM : VMA_FILE; + + unlink(finfo.path); /* Cleanup from previous failed tests */ + printf("Creating %s for collapse%s...", finfo.path, + finfo.type == VMA_SHMEM ? " (tmpfs)" : ""); + fd = open(finfo.path, O_DSYNC | O_CREAT | O_RDWR | O_TRUNC | O_EXCL, + 777); + if (fd < 0) { + perror("open()"); + exit(EXIT_FAILURE); + } + + size = nr_hpages * hpage_pmd_size; + p = alloc_mapping(nr_hpages); + fill_memory(p, 0, size); + write(fd, p, size); + close(fd); + munmap(p, size); + success("OK"); + + printf("Opening %s read only for collapse...", finfo.path); + finfo.fd = open(finfo.path, O_RDONLY, 777); + if (finfo.fd < 0) { + perror("open()"); + exit(EXIT_FAILURE); + } + p = mmap(BASE_ADDR, size, PROT_READ | PROT_EXEC, + MAP_PRIVATE, finfo.fd, 0); + if (p == MAP_FAILED || p != BASE_ADDR) { + perror("mmap()"); + exit(EXIT_FAILURE); + } + + /* Drop page cache */ + write_file("/proc/sys/vm/drop_caches", "3", 2); + success("OK"); + return p; +} + +static void file_cleanup_area(void *p, unsigned long size) +{ + munmap(p, size); + close(finfo.fd); + unlink(finfo.path); +} + +static void file_fault(void *p, unsigned long start, unsigned long end) +{ + if (madvise(((char *)p) + start, end - start, MADV_POPULATE_READ)) { + perror("madvise(MADV_POPULATE_READ"); + exit(EXIT_FAILURE); + } +} + +static bool file_check_huge(void *addr, int nr_hpages) +{ + switch (finfo.type) { + case VMA_FILE: + return check_huge_file(addr, nr_hpages, hpage_pmd_size); + case VMA_SHMEM: + return check_huge_shmem(addr, nr_hpages, hpage_pmd_size); + default: + exit(EXIT_FAILURE); + return false; + } +} + +static struct mem_ops __anon_ops = { .setup_area = &anon_setup_area, .cleanup_area = &anon_cleanup_area, .fault = &anon_fault, .check_huge = &anon_check_huge, + .name = "anon", +}; + +static struct mem_ops __file_ops = { + .setup_area = &file_setup_area, + .cleanup_area = &file_cleanup_area, + .fault = &file_fault, + .check_huge = &file_check_huge, + .name = "file", }; static void __madvise_collapse(const char *msg, char *p, int nr_hpages, @@ -509,6 +666,7 @@ static void __madvise_collapse(const char *msg, char *p, int nr_hpages, * ignores /sys/kernel/mm/transparent_hugepage/enabled */ settings.thp_enabled = THP_NEVER; + settings.shmem_enabled = SHMEM_NEVER; push_settings(&settings); /* Clear VM_NOHUGEPAGE */ @@ -580,12 +738,37 @@ static void khugepaged_collapse(const char *msg, char *p, int nr_hpages, return; } + /* + * For file and shmem memory, khugepaged only retracts pte entries after + * putting the new hugepage in the page cache. The hugepage must be + * subsequently refaulted to install the pmd mapping for the mm. + */ + if (ops != &__anon_ops) + ops->fault(p, 0, nr_hpages * hpage_pmd_size); + if (ops->check_huge(p, expect ? nr_hpages : 0)) success("OK"); else fail("Fail"); } +static struct collapse_context __khugepaged_context = { + .collapse = &khugepaged_collapse, + .enforce_pte_scan_limits = true, + .name = "khugepaged", +}; + +static struct collapse_context __madvise_context = { + .collapse = &madvise_collapse, + .enforce_pte_scan_limits = false, + .name = "madvise", +}; + +static bool is_tmpfs(struct mem_ops *ops) +{ + return ops == &__file_ops && finfo.type == VMA_SHMEM; +} + static void alloc_at_fault(void) { struct settings settings = *current_settings(); @@ -658,6 +841,13 @@ static void collapse_max_ptes_none(struct collapse_context *c, struct mem_ops *o p = ops->setup_area(1); + if (is_tmpfs(ops)) { + /* shmem pages always in the page cache */ + printf("tmpfs..."); + skip("Skip"); + goto skip; + } + ops->fault(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); c->collapse("Maybe collapse with max_ptes_none exceeded", p, 1, ops, !c->enforce_pte_scan_limits); @@ -670,6 +860,7 @@ static void collapse_max_ptes_none(struct collapse_context *c, struct mem_ops *o validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); } +skip: ops->cleanup_area(p, hpage_pmd_size); pop_settings(); } @@ -753,6 +944,13 @@ static void collapse_single_pte_entry_compound(struct collapse_context *c, struc p = alloc_hpage(ops); + if (is_tmpfs(ops)) { + /* MADV_DONTNEED won't evict tmpfs pages */ + printf("tmpfs..."); + skip("Skip"); + goto skip; + } + madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); printf("Split huge page leaving single PTE mapping compound page..."); madvise(p + page_size, hpage_pmd_size - page_size, MADV_DONTNEED); @@ -764,6 +962,7 @@ static void collapse_single_pte_entry_compound(struct collapse_context *c, struc c->collapse("Collapse PTE table with single PTE mapping compound page", p, 1, ops, true); validate_memory(p, 0, page_size); +skip: ops->cleanup_area(p, hpage_pmd_size); } @@ -1010,9 +1209,72 @@ static void madvise_collapse_existing_thps(struct collapse_context *c, ops->cleanup_area(p, hpage_pmd_size); } +static void usage(void) +{ + fprintf(stderr, "\nUsage: ./khugepaged [dir]\n\n"); + fprintf(stderr, "\t\t: :\n"); + fprintf(stderr, "\t\t: [all|khugepaged|madvise]\n"); + fprintf(stderr, "\t\t: [all|anon|file]\n"); + fprintf(stderr, "\n\t\"file,all\" mem_type requires [dir] argument\n"); + fprintf(stderr, "\n\t\"file,all\" mem_type requires kernel built with\n"); + fprintf(stderr, "\tCONFIG_READ_ONLY_THP_FOR_FS=y\n"); + fprintf(stderr, "\n\tif [dir] is a (sub)directory of a tmpfs mount, tmpfs must be\n"); + fprintf(stderr, "\tmounted with huge=madvise option for khugepaged tests to work\n"); + exit(1); +} + +static void parse_test_type(int argc, const char **argv) +{ + char *buf; + const char *token; + + if (argc == 1) { + /* Backwards compatibility */ + khugepaged_context = &__khugepaged_context; + madvise_context = &__madvise_context; + anon_ops = &__anon_ops; + return; + } + + buf = strdup(argv[1]); + token = strsep(&buf, ":"); + + if (!strcmp(token, "all")) { + khugepaged_context = &__khugepaged_context; + madvise_context = &__madvise_context; + } else if (!strcmp(token, "khugepaged")) { + khugepaged_context = &__khugepaged_context; + } else if (!strcmp(token, "madvise")) { + madvise_context = &__madvise_context; + } else { + usage(); + } + + if (!buf) + usage(); + + if (!strcmp(buf, "all")) { + file_ops = &__file_ops; + anon_ops = &__anon_ops; + } else if (!strcmp(buf, "anon")) { + anon_ops = &__anon_ops; + } else if (!strcmp(buf, "file")) { + file_ops = &__file_ops; + } else { + usage(); + } + + if (!file_ops) + return; + + if (argc != 3) + usage(); + + finfo.dir = argv[2]; +} + int main(int argc, const char **argv) { - struct collapse_context c; struct settings default_settings = { .thp_enabled = THP_MADVISE, .thp_defrag = THP_DEFRAG_ALWAYS, @@ -1023,8 +1285,17 @@ int main(int argc, const char **argv) .alloc_sleep_millisecs = 10, .scan_sleep_millisecs = 10, }, + /* + * When testing file-backed memory, the collapse path + * looks at how many pages are found in the page cache, not + * what pages are mapped. Disable read ahead optimization so + * pages don't find their way into the page cache unless + * we mem_ops->fault() them in. + */ + .read_ahead_kb = 0, }; - const char *tests = argc == 1 ? "all" : argv[1]; + + parse_test_type(argc, argv); setbuf(stdout, NULL); @@ -1042,43 +1313,61 @@ int main(int argc, const char **argv) alloc_at_fault(); - if (!strcmp(tests, "khugepaged") || !strcmp(tests, "all")) { - printf("\n*** Testing context: khugepaged ***\n"); - c.collapse = &khugepaged_collapse; - c.enforce_pte_scan_limits = true; - - collapse_full(&c, &anon_ops); - collapse_empty(&c, &anon_ops); - collapse_single_pte_entry(&c, &anon_ops); - collapse_max_ptes_none(&c, &anon_ops); - collapse_swapin_single_pte(&c, &anon_ops); - collapse_max_ptes_swap(&c, &anon_ops); - collapse_single_pte_entry_compound(&c, &anon_ops); - collapse_full_of_compound(&c, &anon_ops); - collapse_compound_extreme(&c, &anon_ops); - collapse_fork(&c, &anon_ops); - collapse_fork_compound(&c, &anon_ops); - collapse_max_ptes_shared(&c, &anon_ops); - } - if (!strcmp(tests, "madvise") || !strcmp(tests, "all")) { - printf("\n*** Testing context: madvise ***\n"); - c.collapse = &madvise_collapse; - c.enforce_pte_scan_limits = false; - - collapse_full(&c, &anon_ops); - collapse_empty(&c, &anon_ops); - collapse_single_pte_entry(&c, &anon_ops); - collapse_max_ptes_none(&c, &anon_ops); - collapse_swapin_single_pte(&c, &anon_ops); - collapse_max_ptes_swap(&c, &anon_ops); - collapse_single_pte_entry_compound(&c, &anon_ops); - collapse_full_of_compound(&c, &anon_ops); - collapse_compound_extreme(&c, &anon_ops); - collapse_fork(&c, &anon_ops); - collapse_fork_compound(&c, &anon_ops); - collapse_max_ptes_shared(&c, &anon_ops); - madvise_collapse_existing_thps(&c, &anon_ops); - } +#define TEST(t, c, o) do { \ + if (c && o) { \ + printf("\nRun test: " #t " (%s:%s)\n", c->name, o->name); \ + t(c, o); \ + } \ + } while (0) + + TEST(collapse_full, khugepaged_context, anon_ops); + TEST(collapse_full, khugepaged_context, file_ops); + TEST(collapse_full, madvise_context, anon_ops); + TEST(collapse_full, madvise_context, file_ops); + + TEST(collapse_empty, khugepaged_context, anon_ops); + TEST(collapse_empty, madvise_context, anon_ops); + + TEST(collapse_single_pte_entry, khugepaged_context, anon_ops); + TEST(collapse_single_pte_entry, khugepaged_context, file_ops); + TEST(collapse_single_pte_entry, madvise_context, anon_ops); + TEST(collapse_single_pte_entry, madvise_context, file_ops); + + TEST(collapse_max_ptes_none, khugepaged_context, anon_ops); + TEST(collapse_max_ptes_none, khugepaged_context, file_ops); + TEST(collapse_max_ptes_none, madvise_context, anon_ops); + TEST(collapse_max_ptes_none, madvise_context, file_ops); + + TEST(collapse_single_pte_entry_compound, khugepaged_context, anon_ops); + TEST(collapse_single_pte_entry_compound, khugepaged_context, file_ops); + TEST(collapse_single_pte_entry_compound, madvise_context, anon_ops); + TEST(collapse_single_pte_entry_compound, madvise_context, file_ops); + + TEST(collapse_full_of_compound, khugepaged_context, anon_ops); + TEST(collapse_full_of_compound, khugepaged_context, file_ops); + TEST(collapse_full_of_compound, madvise_context, anon_ops); + TEST(collapse_full_of_compound, madvise_context, file_ops); + + TEST(collapse_compound_extreme, khugepaged_context, anon_ops); + TEST(collapse_compound_extreme, madvise_context, anon_ops); + + TEST(collapse_swapin_single_pte, khugepaged_context, anon_ops); + TEST(collapse_swapin_single_pte, madvise_context, anon_ops); + + TEST(collapse_max_ptes_swap, khugepaged_context, anon_ops); + TEST(collapse_max_ptes_swap, madvise_context, anon_ops); + + TEST(collapse_fork, khugepaged_context, anon_ops); + TEST(collapse_fork, madvise_context, anon_ops); + + TEST(collapse_fork_compound, khugepaged_context, anon_ops); + TEST(collapse_fork_compound, madvise_context, anon_ops); + + TEST(collapse_max_ptes_shared, khugepaged_context, anon_ops); + TEST(collapse_max_ptes_shared, madvise_context, anon_ops); + + TEST(madvise_collapse_existing_thps, madvise_context, anon_ops); + TEST(madvise_collapse_existing_thps, madvise_context, file_ops); restore_settings(0); } diff --git a/tools/testing/selftests/vm/vm_util.c b/tools/testing/selftests/vm/vm_util.c index 9dae51b8219f..f11f8adda521 100644 --- a/tools/testing/selftests/vm/vm_util.c +++ b/tools/testing/selftests/vm/vm_util.c @@ -114,3 +114,13 @@ bool check_huge_anon(void *addr, int nr_hpages, uint64_t hpage_size) { return __check_huge(addr, "AnonHugePages: ", nr_hpages, hpage_size); } + +bool check_huge_file(void *addr, int nr_hpages, uint64_t hpage_size) +{ + return __check_huge(addr, "FilePmdMapped:", nr_hpages, hpage_size); +} + +bool check_huge_shmem(void *addr, int nr_hpages, uint64_t hpage_size) +{ + return __check_huge(addr, "ShmemPmdMapped:", nr_hpages, hpage_size); +} diff --git a/tools/testing/selftests/vm/vm_util.h b/tools/testing/selftests/vm/vm_util.h index 8434ea0c95cd..5c35de454e08 100644 --- a/tools/testing/selftests/vm/vm_util.h +++ b/tools/testing/selftests/vm/vm_util.h @@ -8,3 +8,5 @@ void clear_softdirty(void); bool check_for_pattern(FILE *fp, const char *pattern, char *buf, size_t len); uint64_t read_pmd_pagesize(void); bool check_huge_anon(void *addr, int nr_hpages, uint64_t hpage_size); +bool check_huge_file(void *addr, int nr_hpages, uint64_t hpage_size); +bool check_huge_shmem(void *addr, int nr_hpages, uint64_t hpage_size); From patchwork Fri Aug 26 22:03:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12956672 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7EE9C0502C for ; Fri, 26 Aug 2022 22:03:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 54E546B0081; Fri, 26 Aug 2022 18:03:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4FFE16B0082; Fri, 26 Aug 2022 18:03:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2659A940007; Fri, 26 Aug 2022 18:03:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1959F6B0081 for ; Fri, 26 Aug 2022 18:03:49 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E16691C676C for ; Fri, 26 Aug 2022 22:03:48 +0000 (UTC) X-FDA: 79843121736.09.0650335 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf16.hostedemail.com (Postfix) with ESMTP id 9306118001A for ; Fri, 26 Aug 2022 22:03:48 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id m11-20020a17090a3f8b00b001fabfce6a26so1404031pjc.4 for ; Fri, 26 Aug 2022 15:03:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc; bh=r4w9NufzePdVztPtjIIPVWCK5v6Iu7NS+jmAu51qoDw=; b=RD3ae8mUYCndIuqcrdUkWwnIpI9D1rlw5mF0SzhiW9XVckzCuWjmXvSx2o5O+VU4cm miGqjOdGoyM4FbdYZrvstvmWyvSdMGYBtdnJFmS4/RQwRjfR70+Q/JrzqZHt6l1swgNd 9tydce9hRCJEqf0RNRZehspPi8BEqKqgDJvCP/H16t4WIyCR6/tS90cS6qdRPgYEwpbG ZX3TIEpXzmSBjQjkOoOZRrtpUv7Z/l/i1iezQGWpzmZJ3PJSMMF27Ra6Q3LdMD7heEbL xj9d7j/yWXy+fp2tZDmW3RSDUC9Mpul6VKqBgQI1NKekD42Zeok6Uu8K20IuHQqnnImW dKEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc; bh=r4w9NufzePdVztPtjIIPVWCK5v6Iu7NS+jmAu51qoDw=; b=73QMrrV2UrUsLp97uziT68PaZxNDr86J3B7eyxrmUs7vDCSgRJok2OmFysNObXBJ1y Or320pxHutAk1AMzPKR+ktTga3dmPnFnOXUT44DNQvnfWBLrJReSGwi8dq+zodNNt8Tj RjogFw2NUF8wgYVISRKDKwezKe/fd9uE45/PvaTiUWeie10us5lMnCfY9GKj2dV7su7j 6lrHl/5zZYGcyqO3hAVcDRjTWdEA5NuR1iC0njjgqp2yDJCbfqecnhNOOLDBRGoblR2K ae2GDO/ZyWOqzvytcb287ZN+5NqH5jnTdAhf4Y7gmTUfhhbSejWsLGmAuSQTecN1VGjV +w2g== X-Gm-Message-State: ACgBeo1EK7tiw9trZHr3aSy61Zi0NJrY+GYm/kxme/Px5XDY53JUICOu +9KrvRIiY5prf16M24eCoXzpvYDE8IJ6Dc59ZCoVZP7SxzLXXccIyQDXrcLTgkDAvzG9CYv4W4v 7LrHbeenx4ptGDWXeuc/0WmzrHU+3E0uBYRhdnlKBZlzzBKyy2/Efd/oorHM= X-Google-Smtp-Source: AA6agR5fKWqmc4//Mf+ya0Ipgzwg+b/hNRW14y/vb3H4wC+X2DjoL7SQ+EHvqRR7b5TCSZOWdplrYeLWm8m3 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:21c2:b0:52b:ff44:6680 with SMTP id t2-20020a056a0021c200b0052bff446680mr5780390pfj.57.1661551427588; Fri, 26 Aug 2022 15:03:47 -0700 (PDT) Date: Fri, 26 Aug 2022 15:03:27 -0700 In-Reply-To: <20220826220329.1495407-1-zokeefe@google.com> Mime-Version: 1.0 References: <20220826220329.1495407-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220826220329.1495407-9-zokeefe@google.com> Subject: [PATCH mm-unstable v2 8/9] selftests/vm: add thp collapse shmem testing From: "Zach O'Keefe" To: linux-mm@kvack.org Cc: Andrew Morton , linux-api@vger.kernel.org, Axel Rasmussen , James Houghton , Hugh Dickins , Yang Shi , Miaohe Lin , David Hildenbrand , David Rientjes , Matthew Wilcox , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Chris Kennelly , "Kirill A. Shutemov" , Minchan Kim , Patrick Xia , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661551428; a=rsa-sha256; cv=none; b=zf9Lbz46x3mFrGTVEzq+9Xutti9kLhof4h1SNRH7sYBEQ2dl07elyjxkZYPLfWyNMY6mkG vbNovLUP/hMx3RHe8VU0ZXo1ZQjct3Ha/lNqV6H8Sy12bCy2s3NAhdTgHObRZhNgGl2HCh qVHw0f1p9/6wKII7BmLtdVA+vSdA9Hc= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=RD3ae8mU; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of 3Q0MJYwcKCA4D2ysstsu22uzs.q20zw18B-00y9oqy.25u@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3Q0MJYwcKCA4D2ysstsu22uzs.q20zw18B-00y9oqy.25u@flex--zokeefe.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661551428; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=r4w9NufzePdVztPtjIIPVWCK5v6Iu7NS+jmAu51qoDw=; b=2ejAP3RZl5iURhw4t5wXcpJnctmecWOq9UnNlrE8vf5UTevOaR9M0cgzD3o564jiref2hm pur2ALec3b1WgrQ6ozIoP0CnaXGRIVdwUUsR3eNAVt3G1eZVdQLAPFwkx05oOSWfLddlVq UzrEe5daqpt4v7bSTNQZyX4dFBH/aGk= X-Rspamd-Queue-Id: 9306118001A Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=RD3ae8mU; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of 3Q0MJYwcKCA4D2ysstsu22uzs.q20zw18B-00y9oqy.25u@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3Q0MJYwcKCA4D2ysstsu22uzs.q20zw18B-00y9oqy.25u@flex--zokeefe.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam01 X-Stat-Signature: an51uqfujyapwubdt4dn1tgqmqfjj6hz X-HE-Tag: 1661551428-612640 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add memory operations for shmem (memfd) memory, and reuse existing tests with the new memory operations. Shmem tests can be called with "shmem" mem_type, and shmem tests are ran with "all" mem_type as well. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 57 ++++++++++++++++++++++++- 1 file changed, 55 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index 0ddfffb87411..9a136f65d43a 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -57,6 +57,7 @@ struct mem_ops { static struct mem_ops *file_ops; static struct mem_ops *anon_ops; +static struct mem_ops *shmem_ops; struct collapse_context { void (*collapse)(const char *msg, char *p, int nr_hpages, @@ -637,6 +638,40 @@ static bool file_check_huge(void *addr, int nr_hpages) } } +static void *shmem_setup_area(int nr_hpages) +{ + void *p; + unsigned long size = nr_hpages * hpage_pmd_size; + + finfo.fd = memfd_create("khugepaged-selftest-collapse-shmem", 0); + if (finfo.fd < 0) { + perror("memfd_create()"); + exit(EXIT_FAILURE); + } + if (ftruncate(finfo.fd, size)) { + perror("ftruncate()"); + exit(EXIT_FAILURE); + } + p = mmap(BASE_ADDR, size, PROT_READ | PROT_WRITE, MAP_SHARED, finfo.fd, + 0); + if (p != BASE_ADDR) { + perror("mmap()"); + exit(EXIT_FAILURE); + } + return p; +} + +static void shmem_cleanup_area(void *p, unsigned long size) +{ + munmap(p, size); + close(finfo.fd); +} + +static bool shmem_check_huge(void *addr, int nr_hpages) +{ + return check_huge_shmem(addr, nr_hpages, hpage_pmd_size); +} + static struct mem_ops __anon_ops = { .setup_area = &anon_setup_area, .cleanup_area = &anon_cleanup_area, @@ -653,6 +688,14 @@ static struct mem_ops __file_ops = { .name = "file", }; +static struct mem_ops __shmem_ops = { + .setup_area = &shmem_setup_area, + .cleanup_area = &shmem_cleanup_area, + .fault = &anon_fault, + .check_huge = &shmem_check_huge, + .name = "shmem", +}; + static void __madvise_collapse(const char *msg, char *p, int nr_hpages, struct mem_ops *ops, bool expect) { @@ -1214,7 +1257,7 @@ static void usage(void) fprintf(stderr, "\nUsage: ./khugepaged [dir]\n\n"); fprintf(stderr, "\t\t: :\n"); fprintf(stderr, "\t\t: [all|khugepaged|madvise]\n"); - fprintf(stderr, "\t\t: [all|anon|file]\n"); + fprintf(stderr, "\t\t: [all|anon|file|shmem]\n"); fprintf(stderr, "\n\t\"file,all\" mem_type requires [dir] argument\n"); fprintf(stderr, "\n\t\"file,all\" mem_type requires kernel built with\n"); fprintf(stderr, "\tCONFIG_READ_ONLY_THP_FOR_FS=y\n"); @@ -1256,10 +1299,13 @@ static void parse_test_type(int argc, const char **argv) if (!strcmp(buf, "all")) { file_ops = &__file_ops; anon_ops = &__anon_ops; + shmem_ops = &__shmem_ops; } else if (!strcmp(buf, "anon")) { anon_ops = &__anon_ops; } else if (!strcmp(buf, "file")) { file_ops = &__file_ops; + } else if (!strcmp(buf, "shmem")) { + shmem_ops = &__shmem_ops; } else { usage(); } @@ -1278,7 +1324,7 @@ int main(int argc, const char **argv) struct settings default_settings = { .thp_enabled = THP_MADVISE, .thp_defrag = THP_DEFRAG_ALWAYS, - .shmem_enabled = SHMEM_NEVER, + .shmem_enabled = SHMEM_ADVISE, .use_zero_page = 0, .khugepaged = { .defrag = 1, @@ -1322,16 +1368,20 @@ int main(int argc, const char **argv) TEST(collapse_full, khugepaged_context, anon_ops); TEST(collapse_full, khugepaged_context, file_ops); + TEST(collapse_full, khugepaged_context, shmem_ops); TEST(collapse_full, madvise_context, anon_ops); TEST(collapse_full, madvise_context, file_ops); + TEST(collapse_full, madvise_context, shmem_ops); TEST(collapse_empty, khugepaged_context, anon_ops); TEST(collapse_empty, madvise_context, anon_ops); TEST(collapse_single_pte_entry, khugepaged_context, anon_ops); TEST(collapse_single_pte_entry, khugepaged_context, file_ops); + TEST(collapse_single_pte_entry, khugepaged_context, shmem_ops); TEST(collapse_single_pte_entry, madvise_context, anon_ops); TEST(collapse_single_pte_entry, madvise_context, file_ops); + TEST(collapse_single_pte_entry, madvise_context, shmem_ops); TEST(collapse_max_ptes_none, khugepaged_context, anon_ops); TEST(collapse_max_ptes_none, khugepaged_context, file_ops); @@ -1345,8 +1395,10 @@ int main(int argc, const char **argv) TEST(collapse_full_of_compound, khugepaged_context, anon_ops); TEST(collapse_full_of_compound, khugepaged_context, file_ops); + TEST(collapse_full_of_compound, khugepaged_context, shmem_ops); TEST(collapse_full_of_compound, madvise_context, anon_ops); TEST(collapse_full_of_compound, madvise_context, file_ops); + TEST(collapse_full_of_compound, madvise_context, shmem_ops); TEST(collapse_compound_extreme, khugepaged_context, anon_ops); TEST(collapse_compound_extreme, madvise_context, anon_ops); @@ -1368,6 +1420,7 @@ int main(int argc, const char **argv) TEST(madvise_collapse_existing_thps, madvise_context, anon_ops); TEST(madvise_collapse_existing_thps, madvise_context, file_ops); + TEST(madvise_collapse_existing_thps, madvise_context, shmem_ops); restore_settings(0); } From patchwork Fri Aug 26 22:03:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12956673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E12AC0502F for ; Fri, 26 Aug 2022 22:03:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 291396B0082; Fri, 26 Aug 2022 18:03:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 24086940007; Fri, 26 Aug 2022 18:03:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3A1F6B0085; Fri, 26 Aug 2022 18:03:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E31466B0082 for ; Fri, 26 Aug 2022 18:03:51 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id BCF56807E5 for ; Fri, 26 Aug 2022 22:03:51 +0000 (UTC) X-FDA: 79843121862.04.0052EE3 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf07.hostedemail.com (Postfix) with ESMTP id 72DD540012 for ; Fri, 26 Aug 2022 22:03:51 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id c7-20020a170902d48700b00172ea5ea9caso1799487plg.15 for ; Fri, 26 Aug 2022 15:03:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc; bh=o1HiHhcgm+W3HuMNDGlnkIBz4rFIRMLbuSwnYzNvFPQ=; b=JwW2pBoQPFlXfauNwVUxiZ2o4ALsKhAL747JGaghWcDO+sZWY+/7SDty4j0dtOxOa5 EbuKMYjHtPy0RnPAGi2g4v37ESPY3in9hAGPY6xbiyfhXqzFkPBoD3FPWST+2jGttpEx O+8yZ4zGpNheM8ivI/ZWoUycd/wm9Ufw8RQgUFCDKGa+SFsrXAK6lxrNLaPsTQ+NjV4j a9DBdnjWbXtokA6DRpl4tH4Z7U8/G8EKMLBM7w4Cd5dwFwpdRTYrMjbB1UhSncgus4XH 7gGqlEfhtGGDVcVA/jLVh64N8BBAtpBqDGiy65/qv4TlFl0PJ46ACV1EOcwuEhv3f2IM q/LQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc; bh=o1HiHhcgm+W3HuMNDGlnkIBz4rFIRMLbuSwnYzNvFPQ=; b=qjSDvc2ZmWWh/pjzR+8dWuvaE9Ed5Qsfiv3SsjEY7jNWkXq+uJS7vq8w/dgyEAOjkZ 7KJZc1Aw+82ClYOpl7pW4P+ORy5gn6CwwNNXaFkVbMiQCdQ2i2wSP5W9VrCmbybAc8up snuUEpqFQPMfR3CQwv8kBC7oroqglD6BZJyd6xefgC1VRn7/DDlAXZJUBpUnFwUgFXlm 3uSmN1Vohl8ZWaTv4P1KwBc5CvuqNETknsxp2PSX6fSptvsTvcqA3MDgIk5F7V8StDCH ojRaBpEdCyYIezpbwj3H0PSD+w2+U4VP/LWiZgHNtIyNZywknJ3+UxQyae7ylsBglQeA Ukjw== X-Gm-Message-State: ACgBeo3nv+gP1ulLzfpnChtJl7ZqFm0flOPuc8WfrYyPeXKtiLq/Z67q xIBJbBLb/ixR7Tj41RDlQXGC69ZPGoYz7iSI5swNwVRS6U5/Q5Xa9GLEWjwyXMriB0Xht7tRPIy 1NUDX5ilhqoxb/8kHtiGgEHzs5zusPxYOgvs3RhZhHr8epe+TuS2wpm0Je2Q= X-Google-Smtp-Source: AA6agR6qle/tLBsvXrx5H51ZNf2Npd3TNiOZVroYTJKJB8o4WyWIPxnakyEMUC8rQVG8dqiJW/bcnt1QcLVA X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:e558:b0:1fb:c4b7:1a24 with SMTP id ei24-20020a17090ae55800b001fbc4b71a24mr349477pjb.1.1661551429096; Fri, 26 Aug 2022 15:03:49 -0700 (PDT) Date: Fri, 26 Aug 2022 15:03:28 -0700 In-Reply-To: <20220826220329.1495407-1-zokeefe@google.com> Mime-Version: 1.0 References: <20220826220329.1495407-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220826220329.1495407-10-zokeefe@google.com> Subject: [PATCH mm-unstable v2 9/9] selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory From: "Zach O'Keefe" To: linux-mm@kvack.org Cc: Andrew Morton , linux-api@vger.kernel.org, Axel Rasmussen , James Houghton , Hugh Dickins , Yang Shi , Miaohe Lin , David Hildenbrand , David Rientjes , Matthew Wilcox , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Chris Kennelly , "Kirill A. Shutemov" , Minchan Kim , Patrick Xia , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661551431; a=rsa-sha256; cv=none; b=sAUtCysCqlYRFXq74pCXVCXrrISEeZYFRPYTRwQW0CcNP79nRs8aD9BR3z+g/iIB0qqfKQ yDwFaJY5er3uiOqS9WJ37T34bdoGHfKlHnFKQ+KpNQdJjrlAGtM/DotAwyR17wqnPawucd 3+iKH8DaL1AiihtTNJospH3CVgD+YLM= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=JwW2pBoQ; spf=pass (imf07.hostedemail.com: domain of 3RUMJYwcKCBAF40uuvuw44w1u.s421y3AD-220Bqs0.47w@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3RUMJYwcKCBAF40uuvuw44w1u.s421y3AD-220Bqs0.47w@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661551431; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=o1HiHhcgm+W3HuMNDGlnkIBz4rFIRMLbuSwnYzNvFPQ=; b=wWlQjcFmGGgN/GR7FNq/XCj4/sZnzMY0CNj/Z2MQ1bOUr655FxTKz4IU32pbG93EFldql0 U7R+1RdGBXtbEl9U+nDJblguFslVacIkbUy3l+FuSCcaOEha6XGdtjl69xacFoZqU+r8Ll 0Piq/xExITrB3oX8fHNm8++60riZ8Co= X-Stat-Signature: spguxmwkn9k39x5gex1yz1q4q8awbraj X-Rspamd-Queue-Id: 72DD540012 X-Rspam-User: X-Rspamd-Server: rspam06 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=JwW2pBoQ; spf=pass (imf07.hostedemail.com: domain of 3RUMJYwcKCBAF40uuvuw44w1u.s421y3AD-220Bqs0.47w@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3RUMJYwcKCBAF40uuvuw44w1u.s421y3AD-220Bqs0.47w@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1661551431-518602 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add :collapse mod to userfaultfd selftest. Currently this mod is only valid for "shmem" test type, but could be used for other test types. When provided, memory allocated by ->allocate_area() will be hugepage-aligned enforced to be hugepage-sized. userfaultf_minor_test, after the UFFD-registered mapping has been populated by UUFD minor fault handler, attempt to MADV_COLLAPSE the UFFD-registered mapping to collapse the memory into a pmd-mapped THP. This test is meant to be a functional test of what occurs during UFFD-driven live migration of VMs backed by huge tmpfs where, after a hugepage-sized region has been successfully migrated (in native page-sized chunks, to avoid latency of fetched a hugepage over the network), we want to reclaim previous VM performance by remapping it at the PMD level. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/userfaultfd.c | 171 ++++++++++++++++++----- 2 files changed, 134 insertions(+), 38 deletions(-) diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index df4fa77febca..c22b5b613296 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -101,6 +101,7 @@ $(OUTPUT)/khugepaged: vm_util.c $(OUTPUT)/madv_populate: vm_util.c $(OUTPUT)/soft-dirty: vm_util.c $(OUTPUT)/split_huge_page_test: vm_util.c +$(OUTPUT)/userfaultfd: vm_util.c ifeq ($(MACHINE),x86_64) BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32)) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 7be709d9eed0..74babdbc02e5 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -61,10 +61,11 @@ #include #include "../kselftest.h" +#include "vm_util.h" #ifdef __NR_userfaultfd -static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size; +static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size, hpage_size; #define BOUNCE_RANDOM (1<<0) #define BOUNCE_RACINGFAULTS (1<<1) @@ -79,6 +80,8 @@ static int test_type; #define UFFD_FLAGS (O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY) +#define BASE_PMD_ADDR ((void *)(1UL << 30)) + /* test using /dev/userfaultfd, instead of userfaultfd(2) */ static bool test_dev_userfaultfd; @@ -97,9 +100,10 @@ static int huge_fd; static unsigned long long *count_verify; static int uffd = -1; static int uffd_flags, finished, *pipefd; -static char *area_src, *area_src_alias, *area_dst, *area_dst_alias; +static char *area_src, *area_src_alias, *area_dst, *area_dst_alias, *area_remap; static char *zeropage; pthread_attr_t attr; +static bool test_collapse; /* Userfaultfd test statistics */ struct uffd_stats { @@ -127,6 +131,8 @@ struct uffd_stats { #define swap(a, b) \ do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0) +#define factor_of_2(x) ((x) ^ ((x) & ((x) - 1))) + const char *examples = "# Run anonymous memory test on 100MiB region with 99999 bounces:\n" "./userfaultfd anon 100 99999\n\n" @@ -152,6 +158,8 @@ static void usage(void) "Supported mods:\n"); fprintf(stderr, "\tsyscall - Use userfaultfd(2) (default)\n"); fprintf(stderr, "\tdev - Use /dev/userfaultfd instead of userfaultfd(2)\n"); + fprintf(stderr, "\tcollapse - Test MADV_COLLAPSE of UFFDIO_REGISTER_MODE_MINOR\n" + "memory\n"); fprintf(stderr, "\nExample test mod usage:\n"); fprintf(stderr, "# Run anonymous memory test with /dev/userfaultfd:\n"); fprintf(stderr, "./userfaultfd anon:dev 100 99999\n\n"); @@ -229,12 +237,10 @@ static void anon_release_pages(char *rel_area) err("madvise(MADV_DONTNEED) failed"); } -static void anon_allocate_area(void **alloc_area) +static void anon_allocate_area(void **alloc_area, bool is_src) { *alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); - if (*alloc_area == MAP_FAILED) - err("mmap of anonymous memory failed"); } static void noop_alias_mapping(__u64 *start, size_t len, unsigned long offset) @@ -252,7 +258,7 @@ static void hugetlb_release_pages(char *rel_area) } } -static void hugetlb_allocate_area(void **alloc_area) +static void hugetlb_allocate_area(void **alloc_area, bool is_src) { void *area_alias = NULL; char **alloc_area_alias; @@ -262,7 +268,7 @@ static void hugetlb_allocate_area(void **alloc_area) nr_pages * page_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | - (*alloc_area == area_src ? 0 : MAP_NORESERVE), + (is_src ? 0 : MAP_NORESERVE), -1, 0); else @@ -270,9 +276,9 @@ static void hugetlb_allocate_area(void **alloc_area) nr_pages * page_size, PROT_READ | PROT_WRITE, MAP_SHARED | - (*alloc_area == area_src ? 0 : MAP_NORESERVE), + (is_src ? 0 : MAP_NORESERVE), huge_fd, - *alloc_area == area_src ? 0 : nr_pages * page_size); + is_src ? 0 : nr_pages * page_size); if (*alloc_area == MAP_FAILED) err("mmap of hugetlbfs file failed"); @@ -282,12 +288,12 @@ static void hugetlb_allocate_area(void **alloc_area) PROT_READ | PROT_WRITE, MAP_SHARED, huge_fd, - *alloc_area == area_src ? 0 : nr_pages * page_size); + is_src ? 0 : nr_pages * page_size); if (area_alias == MAP_FAILED) err("mmap of hugetlb file alias failed"); } - if (*alloc_area == area_src) { + if (is_src) { alloc_area_alias = &area_src_alias; } else { alloc_area_alias = &area_dst_alias; @@ -310,21 +316,36 @@ static void shmem_release_pages(char *rel_area) err("madvise(MADV_REMOVE) failed"); } -static void shmem_allocate_area(void **alloc_area) +static void shmem_allocate_area(void **alloc_area, bool is_src) { void *area_alias = NULL; - bool is_src = alloc_area == (void **)&area_src; - unsigned long offset = is_src ? 0 : nr_pages * page_size; + size_t bytes = nr_pages * page_size; + unsigned long offset = is_src ? 0 : bytes; + char *p = NULL, *p_alias = NULL; + + if (test_collapse) { + p = BASE_PMD_ADDR; + if (!is_src) + /* src map + alias + interleaved hpages */ + p += 2 * (bytes + hpage_size); + p_alias = p; + p_alias += bytes; + p_alias += hpage_size; /* Prevent src/dst VMA merge */ + } - *alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE, - MAP_SHARED, shm_fd, offset); + *alloc_area = mmap(p, bytes, PROT_READ | PROT_WRITE, MAP_SHARED, + shm_fd, offset); if (*alloc_area == MAP_FAILED) err("mmap of memfd failed"); + if (test_collapse && *alloc_area != p) + err("mmap of memfd failed at %p", p); - area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE, - MAP_SHARED, shm_fd, offset); + area_alias = mmap(p_alias, bytes, PROT_READ | PROT_WRITE, MAP_SHARED, + shm_fd, offset); if (area_alias == MAP_FAILED) err("mmap of memfd alias failed"); + if (test_collapse && area_alias != p_alias) + err("mmap of anonymous memory failed at %p", p_alias); if (is_src) area_src_alias = area_alias; @@ -337,28 +358,39 @@ static void shmem_alias_mapping(__u64 *start, size_t len, unsigned long offset) *start = (unsigned long)area_dst_alias + offset; } +static void shmem_check_pmd_mapping(void *p, int expect_nr_hpages) +{ + if (!check_huge_shmem(area_dst_alias, expect_nr_hpages, hpage_size)) + err("Did not find expected %d number of hugepages", + expect_nr_hpages); +} + struct uffd_test_ops { - void (*allocate_area)(void **alloc_area); + void (*allocate_area)(void **alloc_area, bool is_src); void (*release_pages)(char *rel_area); void (*alias_mapping)(__u64 *start, size_t len, unsigned long offset); + void (*check_pmd_mapping)(void *p, int expect_nr_hpages); }; static struct uffd_test_ops anon_uffd_test_ops = { .allocate_area = anon_allocate_area, .release_pages = anon_release_pages, .alias_mapping = noop_alias_mapping, + .check_pmd_mapping = NULL, }; static struct uffd_test_ops shmem_uffd_test_ops = { .allocate_area = shmem_allocate_area, .release_pages = shmem_release_pages, .alias_mapping = shmem_alias_mapping, + .check_pmd_mapping = shmem_check_pmd_mapping, }; static struct uffd_test_ops hugetlb_uffd_test_ops = { .allocate_area = hugetlb_allocate_area, .release_pages = hugetlb_release_pages, .alias_mapping = hugetlb_alias_mapping, + .check_pmd_mapping = NULL, }; static struct uffd_test_ops *uffd_test_ops; @@ -478,6 +510,7 @@ static void uffd_test_ctx_clear(void) munmap_area((void **)&area_src_alias); munmap_area((void **)&area_dst); munmap_area((void **)&area_dst_alias); + munmap_area((void **)&area_remap); } static void uffd_test_ctx_init(uint64_t features) @@ -486,8 +519,8 @@ static void uffd_test_ctx_init(uint64_t features) uffd_test_ctx_clear(); - uffd_test_ops->allocate_area((void **)&area_src); - uffd_test_ops->allocate_area((void **)&area_dst); + uffd_test_ops->allocate_area((void **)&area_src, true); + uffd_test_ops->allocate_area((void **)&area_dst, false); userfaultfd_open(&features); @@ -804,6 +837,7 @@ static void *uffd_poll_thread(void *arg) err("remove failure"); break; case UFFD_EVENT_REMAP: + area_remap = area_dst; /* save for later unmap */ area_dst = (char *)(unsigned long)msg.arg.remap.to; break; } @@ -1256,13 +1290,30 @@ static int userfaultfd_sig_test(void) return userfaults != 0; } +void check_memory_contents(char *p) +{ + unsigned long i; + uint8_t expected_byte; + void *expected_page; + + if (posix_memalign(&expected_page, page_size, page_size)) + err("out of memory"); + + for (i = 0; i < nr_pages; ++i) { + expected_byte = ~((uint8_t)(i % ((uint8_t)-1))); + memset(expected_page, expected_byte, page_size); + if (my_bcmp(expected_page, p + (i * page_size), page_size)) + err("unexpected page contents after minor fault"); + } + + free(expected_page); +} + static int userfaultfd_minor_test(void) { - struct uffdio_register uffdio_register; unsigned long p; + struct uffdio_register uffdio_register; pthread_t uffd_mon; - uint8_t expected_byte; - void *expected_page; char c; struct uffd_stats stats = { 0 }; @@ -1301,17 +1352,7 @@ static int userfaultfd_minor_test(void) * fault. uffd_poll_thread will resolve the fault by bit-flipping the * page's contents, and then issuing a CONTINUE ioctl. */ - - if (posix_memalign(&expected_page, page_size, page_size)) - err("out of memory"); - - for (p = 0; p < nr_pages; ++p) { - expected_byte = ~((uint8_t)(p % ((uint8_t)-1))); - memset(expected_page, expected_byte, page_size); - if (my_bcmp(expected_page, area_dst_alias + (p * page_size), - page_size)) - err("unexpected page contents after minor fault"); - } + check_memory_contents(area_dst_alias); if (write(pipefd[1], &c, sizeof(c)) != sizeof(c)) err("pipe write"); @@ -1320,6 +1361,23 @@ static int userfaultfd_minor_test(void) uffd_stats_report(&stats, 1); + if (test_collapse) { + printf("testing collapse of uffd memory into PMD-mapped THPs:"); + if (madvise(area_dst_alias, nr_pages * page_size, + MADV_COLLAPSE)) + err("madvise(MADV_COLLAPSE)"); + + uffd_test_ops->check_pmd_mapping(area_dst, + nr_pages * page_size / + hpage_size); + /* + * This won't cause uffd-fault - it purely just makes sure there + * was no corruption. + */ + check_memory_contents(area_dst_alias); + printf(" done.\n"); + } + return stats.missing_faults != 0 || stats.minor_faults != nr_pages; } @@ -1656,6 +1714,8 @@ static void parse_test_type_arg(const char *raw_type) test_dev_userfaultfd = true; else if (!strcmp(token, "syscall")) test_dev_userfaultfd = false; + else if (!strcmp(token, "collapse")) + test_collapse = true; else err("unrecognized test mod '%s'", token); } @@ -1663,8 +1723,11 @@ static void parse_test_type_arg(const char *raw_type) if (!test_type) err("failed to parse test type argument: '%s'", raw_type); + if (test_collapse && test_type != TEST_SHMEM) + err("Unsupported test: %s", raw_type); + if (test_type == TEST_HUGETLB) - page_size = default_huge_page_size(); + page_size = hpage_size; else page_size = sysconf(_SC_PAGE_SIZE); @@ -1702,6 +1765,8 @@ static void sigalrm(int sig) int main(int argc, char **argv) { + size_t bytes; + if (argc < 4) usage(); @@ -1709,11 +1774,41 @@ int main(int argc, char **argv) err("failed to arm SIGALRM"); alarm(ALARM_INTERVAL_SECS); + hpage_size = default_huge_page_size(); parse_test_type_arg(argv[1]); + bytes = atol(argv[2]) * 1024 * 1024; + + if (test_collapse && bytes & (hpage_size - 1)) + err("MiB must be multiple of %lu if :collapse mod set", + hpage_size >> 20); nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); - nr_pages_per_cpu = atol(argv[2]) * 1024*1024 / page_size / - nr_cpus; + + if (test_collapse) { + /* nr_cpus must divide (bytes / page_size), otherwise, + * area allocations of (nr_pages * paze_size) won't be a + * multiple of hpage_size, even if bytes is a multiple of + * hpage_size. + * + * This means that nr_cpus must divide (N * (2 << (H-P)) + * where: + * bytes = hpage_size * N + * hpage_size = 2 << H + * page_size = 2 << P + * + * And we want to chose nr_cpus to be the largest value + * satisfying this constraint, not larger than the number + * of online CPUs. Unfortunately, prime factorization of + * N and nr_cpus may be arbitrary, so have to search for it. + * Instead, just use the highest power of 2 dividing both + * nr_cpus and (bytes / page_size). + */ + int x = factor_of_2(nr_cpus); + int y = factor_of_2(bytes / page_size); + + nr_cpus = x < y ? x : y; + } + nr_pages_per_cpu = bytes / page_size / nr_cpus; if (!nr_pages_per_cpu) { _err("invalid MiB"); usage();