From patchwork Fri Aug 12 01:28:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12941848 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A6C5C25B06 for ; Fri, 12 Aug 2022 01:29:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D74F8E0001; Thu, 11 Aug 2022 21:29:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9603D6B0075; Thu, 11 Aug 2022 21:29:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D8CB8E0001; Thu, 11 Aug 2022 21:29:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 679436B0073 for ; Thu, 11 Aug 2022 21:29:12 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 3A37B140F84 for ; Fri, 12 Aug 2022 01:29:12 +0000 (UTC) X-FDA: 79789207344.22.26E76B7 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf27.hostedemail.com (Postfix) with ESMTP id B6E7840078 for ; Fri, 12 Aug 2022 01:29:11 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-31f63772b89so162158527b3.6 for ; Thu, 11 Aug 2022 18:29:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:mime-version:message-id:date:from:to:cc; bh=if6uvJ3UFEkHYXJdm0V6NqlZ2nbAtRfgdCIDdGufHqk=; b=f1qIxcWBpgUufqegB/VCYfxG65M3tQBjlGY00iCGIyKGgqzKQoYoSeMtQaChrxuhQa VQ+MggfBQGj8fvSJ+rjWpWZvv6ApWhp/WA/Wqbh3ix8umCYh9cUOnM6m1AxamlJIZ1tO KfdqIGPtcxQlBoGc0JQYLcd3xVwyYrzeHCB5sSWqn2Oy0WbbuT77/s9tjNztd9VcaBpU F0VeKM4iQGpZc7YgcoF5wJ2kKG30iND3+YBeTVR7YL7DgiejooAvwjLV78p0HWLmMIUp aKiMXEWHum3aTUVFtFf/e0wO7QqLzb8myS19ASXtHXviZwH78i/hGnyRjydqUltg9x8h t87w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:mime-version:message-id:date:x-gm-message-state :from:to:cc; bh=if6uvJ3UFEkHYXJdm0V6NqlZ2nbAtRfgdCIDdGufHqk=; b=XV80uCa8CzjRFbQLl1njf5CblnmjbfhpO+aDMNi6x2HllNMWMdJh8C7XMy5aYtTKrv /zRmZUrIAcByUduRuA5CCU/pk5nrqMZ03XWmGyHKkNdRANX/8l3bgKL7Qg0tQqqCdFSB +ZaDJSSSNLKKd0TLQPn2W0xsH/oUPv+TqrP62ST+3AsOt7SS97oQtKBAm8qfQl2LCwtH 4fQzIe+3DcbfqJnofzMa3P52FmPH+DSiYSuxTxAKaILTey/nZoiSbmsF8HvCXJRdI3Pj d1l/NbvNtnBYkTafmc7wK9DQoeQIcu8qfgG6hXRslzNxImpckwVekutPP7vHXK0LDpMi 1Eig== X-Gm-Message-State: ACgBeo0esMmMrJea7v3BGcxVwWELeAgRHyeHjqiyXtID9C2QFppwWT7j 5FHoKdx+vke5KVYdsuKSSfJElPwlsE6Cr9SJwIPg/wxbZgoDSUIrzhiJuWrOIaHHWPxurApfiVT qHlflFQpl13p+J7F5X8jXJS2tUUIAaEB7Rp0v4oXDL4DQmek7jFo1whjd6Rg= X-Google-Smtp-Source: AA6agR5/yGF/fzg551Z5jnnuhxriIQy2SWtQ2Mki9B5nZLwTovKFrS/tr+NhvFSesuwKe+E7J9Ot3a/G4jzD X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a81:6dca:0:b0:325:472d:bcb0 with SMTP id i193-20020a816dca000000b00325472dbcb0mr1859096ywc.300.1660267750918; Thu, 11 Aug 2022 18:29:10 -0700 (PDT) Date: Thu, 11 Aug 2022 18:28:34 -0700 Message-Id: <20220812012843.3948330-1-zokeefe@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog Subject: [PATCH mm-unstable 0/9] mm: add file/shmem support to MADV_COLLAPSE From: "Zach O'Keefe" To: linux-mm@kvack.org Cc: Andrew Morton , linux-api@vger.kernel.org, Axel Rasmussen , James Houghton , Hugh Dickins , Yang Shi , Miaohe Lin , David Hildenbrand , David Rientjes , Matthew Wilcox , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Chris Kennelly , "Kirill A. Shutemov" , Minchan Kim , Patrick Xia , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660267751; a=rsa-sha256; cv=none; b=cN4/3ehRxyTCWCRQEuAWLC8K79FNx+c2YYfrQrbQFD2EXSyXYfmFCPPqwmDnsjI1PZs6Wp Cj3hcFH4Uio+wnDQa4zfDfps0SkaSTbfcaYvYMIKsbDye8eBoQXm3l+XEjJ8nrmHPiRdBH EU7UOsEpY+no0v5z3/HP5c8pCcbEuSA= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=f1qIxcWB; spf=pass (imf27.hostedemail.com: domain of 35qz1YgcKCDUqfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=35qz1YgcKCDUqfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660267751; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=if6uvJ3UFEkHYXJdm0V6NqlZ2nbAtRfgdCIDdGufHqk=; b=kqeCbwFeym+Hc18TnMPr5Cpk02RsxvohgDlQ2X9eYj/txvXgb0HE0aZkvAz89ngF+1wmYp xq9babZxyfEE2iTtB2Qveb2TDWrr+u7lE8EuoegFNRO14peht1x7g0Yh1uJrCX8g8f6bID FRY2jAL+CGpyt4yewSr7i9EHeO26JBU= X-Rspamd-Queue-Id: B6E7840078 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=f1qIxcWB; spf=pass (imf27.hostedemail.com: domain of 35qz1YgcKCDUqfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=35qz1YgcKCDUqfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: 6p5kcca9apq59hgi6sc5qubqf9okf6n3 X-HE-Tag: 1660267751-380349 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This series builds on top of the previous "mm: userspace hugepage collapse" series which introduced the MADV_COLLAPSE madvise mode and added support for private, anonymous mappings[1], by adding support for file and shmem backed memory to CONFIG_READ_ONLY_THP_FOR_FS=y kernels. File and shmem support have been added with effort to align with existing MADV_COLLAPSE semantics and policy decisions[2]. Collapse of shmem-backed memory ignores kernel-guiding directives and heuristics including all sysfs settings (transparent_hugepage/shmem_enabled), and tmpfs huge= mount options (shmem always supports large folios). Like anonymous mappings, on successful return of MADV_COLLAPSE on file/shmem memory, the contents of memory mapped by the addresses provided will be synchronously pmd-mapped THPs. This functionality unlocks two important uses: (1) Immediately back executable text by THPs. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which might impair services from serving at their full rated load after (re)starting. Tricks like mremap(2)'ing text onto anonymous memory to immediately realize iTLB performance prevents page sharing and demand paging, both of which increase steady state memory footprint. Now, we can have the best of both worlds: Peak upfront performance and lower RAM footprints. (2) userfaultfd-based live migration of virtual machines satisfy UFFD faults by fetching native-sized pages over the network (to avoid latency of transferring an entire hugepage). However, after guest memory has been fully copied to the new host, MADV_COLLAPSE can be used to immediately increase guest performance. khugepaged has received a small improvement by association and can now detect and collapse pte-mapped THPs. However, there is still work to be done along the file collapse path. Compound pages of arbitrary order still needs to be supported and THP collapse needs to be converted to using folios in general. Eventually, we'd like to move away from the read-only and executable-mapped constraints currently imposed on eligible files and support any inode claiming huge folio support. That said, I think the series as-is covers enough to claim that MADV_COLLAPSE supports file/shmem memory. Patches 1-3 Implement the guts of the series. Patch 4 Is a tracepoint for debugging. Patches 5-8 Refactor existing khugepaged selftests to work with new memory types. Patch 9 Adds a userfaultfd selftest mode to mimic a functional test of UFFDIO_REGISTER_MODE_MINOR+MADV_COLLAPSE live migration. Applies against mm-unstable. [1] https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/ [2] https://lore.kernel.org/linux-mm/YtBmhaiPHUTkJml8@google.com/ Zach O'Keefe (9): mm/shmem: add flag to enforce shmem THP in hugepage_vma_check() mm/khugepaged: attempt to map file/shmem-backed pte-mapped THPs by pmds mm/madvise: add file and shmem support to MADV_COLLAPSE mm/khugepaged: add tracepoint to hpage_collapse_scan_file() selftests/vm: dedup THP helpers selftests/vm: modularize thp collapse memory operations selftests/vm: add thp collapse file and tmpfs testing selftests/vm: add thp collapse shmem testing selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory include/linux/khugepaged.h | 11 +- include/linux/shmem_fs.h | 10 +- include/trace/events/huge_memory.h | 36 + kernel/events/uprobes.c | 2 +- mm/huge_memory.c | 2 +- mm/khugepaged.c | 285 ++++-- mm/shmem.c | 18 +- tools/testing/selftests/vm/Makefile | 2 + tools/testing/selftests/vm/khugepaged.c | 828 ++++++++++++------ tools/testing/selftests/vm/soft-dirty.c | 2 +- .../selftests/vm/split_huge_page_test.c | 12 +- tools/testing/selftests/vm/userfaultfd.c | 171 +++- tools/testing/selftests/vm/vm_util.c | 36 +- tools/testing/selftests/vm/vm_util.h | 5 +- 14 files changed, 1034 insertions(+), 386 deletions(-)