From patchwork Wed Jul 14 22:20:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12377943 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2564CC47E48 for ; Wed, 14 Jul 2021 22:21:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B12F9613CB for ; Wed, 14 Jul 2021 22:21:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B12F9613CB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 426DB6B00AB; Wed, 14 Jul 2021 18:21:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FF7B6B00AD; Wed, 14 Jul 2021 18:21:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 250E76B00AE; Wed, 14 Jul 2021 18:21:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0182.hostedemail.com [216.40.44.182]) by kanga.kvack.org (Postfix) with ESMTP id F10246B00AB for ; Wed, 14 Jul 2021 18:21:30 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 17042184A1 for ; Wed, 14 Jul 2021 22:21:29 +0000 (UTC) X-FDA: 78362615898.14.02D8D3C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf08.hostedemail.com (Postfix) with ESMTP id 965CF30000A9 for ; Wed, 14 Jul 2021 22:21:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626301288; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=0YkR7vOhm0GKTAWFcS7zut5Yw67gBtzFVf/OamQ2bkc=; b=eVL02B0oJ1n3NGEbnEtecWBWkYFhQw9kt9LdaAwpgXJE4TstUwpqfFizYnOQBHfC/2cxQP 83gbZFnNhfM/fb2T5EgS6qt4qOkiI4EI1ikXbanZLb3OpUzAkAbK8akK+BAYV0rmUqwyFC NfVmLUeL/4bwsKsGGz2f+mQd7Ga/wXY= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-445-X1EkBYgfPcC75B30ds72RA-1; Wed, 14 Jul 2021 18:21:21 -0400 X-MC-Unique: X1EkBYgfPcC75B30ds72RA-1 Received: by mail-qk1-f199.google.com with SMTP id j9-20020a05620a0009b02903b770762a3cso2275415qki.17 for ; Wed, 14 Jul 2021 15:21:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=V4R8I/q91LOV3HLpNvx88Udmhuap5qU40k5RD7ZAgHU=; b=LcNbznkeedjwHVM3XPBayBhKAk9K1lLdZWds7+0XLIxrhtXyeoEwfOzzL9Qb8Y5BpE c4OSKxxtOlqxA7CDHvZHPb0v9Fsiuu/pekBdLAIBg58uY5wbL+8jWzIUWQ/mXEqcsnk2 +PMFnf6u5UfBJF+laH9rOQcwN6beRvS29i8nH9lnFhpOxDIpDcPsUsY9OtgYRl7YCl4T +8Cxv7MsxfPDP6z1m+Az1SmUTl1JlEgAPj+Ny6DOQu0Se5xJOu0gVjnjc7lXFVyvKFZs P4bithpIoBUq94wVKCwxfhZs+dejsZsrIA78Nm5+GJkzKoaFYhRynkiQr2sx7MyFa0z3 A9Hg== X-Gm-Message-State: AOAM530mbf1lXM2M2mgZq8NDbgW7ZCr0Gkehngb6Edr3lrsX5oulAyzp gSbIc0sf0aMkh7C/tHh+rOIrJ8x8zfeFKc8i40M14r7jXqO8d8ClLXmt+5DC6MS4FNvgpsm6ByS GTKNMl+2XzlA= X-Received: by 2002:ac8:4e89:: with SMTP id 9mr322457qtp.356.1626301281081; Wed, 14 Jul 2021 15:21:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwQoMZ0U8D/9+CTlVJ/rWsoHPblkvBTUeopDbqOvpymZ9KQKLGEvyXv/jj4z28On3rUNcdv/A== X-Received: by 2002:ac8:4e89:: with SMTP id 9mr322429qtp.356.1626301280750; Wed, 14 Jul 2021 15:21:20 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-65-184-144-111-238.dsl.bell.ca. [184.144.111.238]) by smtp.gmail.com with ESMTPSA id b25sm1625854qka.123.2021.07.14.15.21.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Jul 2021 15:21:20 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Jason Gunthorpe , peterx@redhat.com, Matthew Wilcox , Andrew Morton , Axel Rasmussen , Nadav Amit , Jerome Glisse , Mike Rapoport , Miaohe Lin , Hugh Dickins , Alistair Popple , Andrea Arcangeli , Mike Kravetz , "Kirill A . Shutemov" , David Hildenbrand Subject: [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Date: Wed, 14 Jul 2021 18:20:51 -0400 Message-Id: <20210714222117.47648-1-peterx@redhat.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eVL02B0o; spf=none (imf08.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam05 X-Stat-Signature: bc43cfmnn4n1dsseg4azu6coz8wuewjm X-Rspamd-Queue-Id: 965CF30000A9 X-HE-Tag: 1626301288-13252 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is v4 of uffd-wp shmem & hugetlbfs support, which completes uffd-wp as a full feature. It's based on v5.14-rc1. The whole series can also be found online [1]. Nothing big really changed from previous version. One thing worth mentioning is in commit 22061a1ffabd we added a new single_page parameter to zap_details from Hugh, then a few zap related patches need some rebase around it. There's also the new unmap_mapping_page() introduced, in this series it will start to run with zap flag ZAP_FLAG_DROP_FILE_UFFD_WP, as this function is used as the last phase to unmap shmem mappings when page being e.g. truncated. It actually even simplified a bit as I can drop the patch "mm: Pass zap_flags into unmap_mapping_pages()" now. Another thing to mention is I further modified (trivially) the test program umap-apps [4] to allow backend storage to be run on a shmem file (it used to be a e.g. XFS file, however as I noticed disk latency is a major bottleneck of umapsort program especially when it's a HDD, shmem backend is much faster). This should stress the kernel a bit more than before. Full changelog listed below. v4: - Rebased to v5.14-rc1 - Collect r-b for Alistair - Patch "mm/userfaultfd: Introduce special pte for unmapped file-backed mem" - make pte_swp_uffd_wp_special() return false for !HAVE_ARCH_USERFAULTFD_WP [Alistair] - Patch "mm: Introduce zap_details.zap_flags" - Rename zap_check_mapping_skip to zap_skip_check_mapping [Alistair] - Patch "mm: Pass zap_flags into unmap_mapping_pages()" - Dropped the patch because after commit 22061a1ffabd it's not needed anymore. - Patch "mm/userfaultfd: Enable write protection for shmem & hugetlbfs" - Drop UFFD_FEATURE_WP_HUGETLBFS_SHMEM too if !CONFIG_HAVE_ARCH_USERFAULTFD_WP - Patch "shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed": - Convert zap_pte_range() of device private to WARN_ON_ONCE() because it must not be a file-backed pte [Alistair] - Coordinate with new commit 22061a1ffabd ("mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()"), add ZAP_FLAG_CHECK_MAPPING for new function unmap_mapping_page(). v3: - Rebase to v5.13-rc3-mmots-2021-05-25-20-12 - Fix commit message and comment for patch "shmem/userfaultfd: Handle uffd-wp special pte in page fault handler", dropping all reference to FAULT_FLAG_UFFD_WP. - Reworked patch "shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP" after Axel's refactoring on uffdio-copy/continue. - Added patch "mm/hugetlb: Introduce huge pte version of uffd-wp helpers", so that huge pte helpers are introduced in one patch. Also add huge_pte_uffd_wp helper, which was missing previously. - Added patch: "mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs", to let pagemap uffd-wp bit work for shmem/hugetlbfs - Added patch: "mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte", to clean up dirty bit together in uffdio-copy v2: - Add R-bs - Added patch "mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h" as noticed/suggested by Mike Kravets - Fix commit message of patch "hugetlb/userfaultfd: Only drop uffd-wp special pte if required" [MikeK] - Removing comments for fields in zap_details since they're either incorrect or not helping [Matthew] - Rephrase commit message in patch "hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP" to explain better on why set dirty bit for UFFDIO_COPY in hugetlbfs [MikeK] - Don't emulate READ for uffd-wp-special on both shmem & hugetlbfs. - Drop FAULT_FLAG_UFFD_WP flag, by checking vmf->orig_pte directly against pte_swp_uffd_wp_special() - Fix race condition of page fault handling on uffd-wp-special [Mike] About Swap Special PTE ====================== In short, the so-called "swap special pte" in this patchset is a new type of pte that doesn't exist in the past, but it got used initially in this series in file-backed memories. It is used to persist information even if the ptes got dropped meanwhile when the page cache still existed. For example, when splitting a file-backed huge pmd, we could be simply dropping the pmd entry then wait until another fault coming. It's okay in the past since all information in the pte can be retained from the page cache when the next page fault triggers. However in this case, uffd-wp is per-pte information which cannot be kept in page cache, so that information needs to be maintained somehow still in the pgtable entry, even if the pgtable entry is going to be dropped. Here instead of replacing with a none entry, we used the "swap special pte". Then when the next page fault triggers, we can observe orig_pte to retain this information. I'm copy-pasting some commit message from the patch "mm/swap: Introduce the idea of special swap ptes", where it tried to explain this pte in another angle: We used to have special swap entries, like migration entries, hw-poison entries, device private entries, etc. Those "special swap entries" reside in the range that they need to be at least swap entries first, and their types are decided by swp_type(entry). This patch introduces another idea called "special swap ptes". It's very easy to get confused against "special swap entries", but a speical swap pte should never contain a swap entry at all. It means, it's illegal to call pte_to_swp_entry() upon a special swap pte. Make the uffd-wp special pte to be the first special swap pte. Before this patch, is_swap_pte()==true means one of the below: (a.1) The pte has a normal swap entry (non_swap_entry()==false). For example, when an anonymous page got swapped out. (a.2) The pte has a special swap entry (non_swap_entry()==true). For example, a migration entry, a hw-poison entry, etc. After this patch, is_swap_pte()==true means one of the below, where case (b) is added: (a) The pte contains a swap entry. (a.1) The pte has a normal swap entry (non_swap_entry()==false). For example, when an anonymous page got swapped out. (a.2) The pte has a special swap entry (non_swap_entry()==true). For example, a migration entry, a hw-poison entry, etc. (b) The pte does not contain a swap entry at all (so it cannot be passed into pte_to_swp_entry()). For example, uffd-wp special swap pte. Hugetlbfs needs similar thing because it's also file-backed. I directly reused the same special pte there, though the shmem/hugetlb change on supporting this new pte is different since they don't share code path a lot. Patch layout ============ Part (1): Shmem support, this is where the special swap pte is introduced. Some zap rework is needed within the process: mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP mm: Clear vmf->pte after pte_unmap_same() returns mm/userfaultfd: Introduce special pte for unmapped file-backed mem mm/swap: Introduce the idea of special swap ptes shmem/userfaultfd: Handle uffd-wp special pte in page fault handler mm: Drop first_index/last_index in zap_details mm: Introduce zap_details.zap_flags mm: Introduce ZAP_FLAG_SKIP_SWAP shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed shmem/userfaultfd: Allow wr-protect none pte for file-backed mem shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps shmem/userfaultfd: Handle the left-overed special swap ptes shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Part (2): Hugetlb supportdisable huge pmd sharing for uffd-wp patches have been merged. The rest is the changes required to teach hugetlbfs understand the special swap pte too that introduced with the uffd-wp change: mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h mm/hugetlb: Introduce huge pte version of uffd-wp helpers hugetlb/userfaultfd: Hook page faults for uffd write protection hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT mm/hugetlb: Introduce huge version of special swap pte helpers hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler hugetlb/userfaultfd: Allow wr-protect none ptes hugetlb/userfaultfd: Only drop uffd-wp special pte if required Part (3): Enable both features in code and test (plus pagemap support) mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs userfaultfd: Enable write protection for shmem & hugetlbfs userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Tests ===== I've tested it using either userfaultfd kselftest program, but also with umapsort [2] which should be even stricter. Tested page swapping in/out during umapsort. If anyone would like to try umapsort, need to use an extremely hacked version of umap library [3], because by default umap only supports anonymous. So to test it we need to build [3] then [2]. Any comment would be greatly welcomed. Thanks, [1] https://github.com/xzpeter/linux/tree/uffd-wp-shmem-hugetlbfs [2] https://github.com/xzpeter/umap-apps/tree/peter [3] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs [4] https://github.com/xzpeter/umap-apps/commit/b0c2c7b1cd9dcb6835e7c59d02ece1f6b7f1ea01 Peter Xu (26): mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP mm: Clear vmf->pte after pte_unmap_same() returns mm/userfaultfd: Introduce special pte for unmapped file-backed mem mm/swap: Introduce the idea of special swap ptes shmem/userfaultfd: Handle uffd-wp special pte in page fault handler mm: Drop first_index/last_index in zap_details mm: Introduce zap_details.zap_flags mm: Introduce ZAP_FLAG_SKIP_SWAP shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed shmem/userfaultfd: Allow wr-protect none pte for file-backed mem shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps shmem/userfaultfd: Handle the left-overed special swap ptes shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h mm/hugetlb: Introduce huge pte version of uffd-wp helpers hugetlb/userfaultfd: Hook page faults for uffd write protection hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT mm/hugetlb: Introduce huge version of special swap pte helpers hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler hugetlb/userfaultfd: Allow wr-protect none ptes hugetlb/userfaultfd: Only drop uffd-wp special pte if required mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs mm/userfaultfd: Enable write protection for shmem & hugetlbfs userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs arch/arm64/kernel/mte.c | 2 +- arch/x86/include/asm/pgtable.h | 28 +++ fs/hugetlbfs/inode.c | 15 +- fs/proc/task_mmu.c | 21 +- fs/userfaultfd.c | 41 ++-- include/asm-generic/hugetlb.h | 15 ++ include/asm-generic/pgtable_uffd.h | 3 + include/linux/hugetlb.h | 30 ++- include/linux/mm.h | 44 +++- include/linux/mm_inline.h | 42 ++++ include/linux/shmem_fs.h | 4 +- include/linux/swapops.h | 39 +++- include/linux/userfaultfd_k.h | 49 +++++ include/uapi/linux/userfaultfd.h | 10 +- mm/gup.c | 2 +- mm/hmm.c | 2 +- mm/hugetlb.c | 160 ++++++++++++--- mm/khugepaged.c | 11 +- mm/madvise.c | 4 +- mm/memcontrol.c | 2 +- mm/memory.c | 244 +++++++++++++++++------ mm/migrate.c | 4 +- mm/mincore.c | 2 +- mm/mprotect.c | 63 +++++- mm/mremap.c | 2 +- mm/page_vma_mapped.c | 6 +- mm/rmap.c | 8 + mm/shmem.c | 5 +- mm/swapfile.c | 2 +- mm/userfaultfd.c | 73 +++++-- tools/testing/selftests/vm/userfaultfd.c | 9 +- 31 files changed, 756 insertions(+), 186 deletions(-)