From patchwork Mon Nov 15 07:54:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618841 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21156C433EF for ; Mon, 15 Nov 2021 07:55:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AAB5263219 for ; Mon, 15 Nov 2021 07:55:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org AAB5263219 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 1861C6B007E; Mon, 15 Nov 2021 02:55:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 135B76B0081; Mon, 15 Nov 2021 02:55:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F25056B0080; Mon, 15 Nov 2021 02:55:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0122.hostedemail.com [216.40.44.122]) by kanga.kvack.org (Postfix) with ESMTP id E389A6B007B for ; Mon, 15 Nov 2021 02:55:49 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 97C44181251A8 for ; Mon, 15 Nov 2021 07:55:49 +0000 (UTC) X-FDA: 78810405576.27.89BFE03 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf21.hostedemail.com (Postfix) with ESMTP id 9DE43D036A55 for ; Mon, 15 Nov 2021 07:55:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636962948; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=OFYWOfIgh+R0pCCOGFbQusLursYsTaR7YBgSvw4x5lw=; b=HKi/AxvRIr89Lekl0R6aOHUBMVxKbCJuTU0gBiJY9qFbYaZnqPnvjUzzHIYF9NTiGht2yI or3gUE/pvJA8C6Jbm5p/ozNSgnXH95ytI1Rgwo4vupIuq2p9Hcui5EZDR45wC4N1KE+XPr 04PqNLVH9lYJDiZumB4Ha1V8QKp0a+c= Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-398-Wr2LAzTTPda_r7LQ9U-m-A-1; Mon, 15 Nov 2021 02:55:39 -0500 X-MC-Unique: Wr2LAzTTPda_r7LQ9U-m-A-1 Received: by mail-pl1-f197.google.com with SMTP id n13-20020a170902d2cd00b0014228ffc40dso5830949plc.4 for ; Sun, 14 Nov 2021 23:55:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=OFYWOfIgh+R0pCCOGFbQusLursYsTaR7YBgSvw4x5lw=; b=RXkXZXAX+xIptkNcHLfCV3eo63ZY5AmBkggpyZPYKfPsZvliT1d3tjQrt+r3EzQ60k yAZug7oC2ELgQXDBVJb8+lO9CHwEAA+9ecEaHgfrgfw+GabtHvCF2c2/PKwaQ/1wKn60 qhYPOwp5ZXKcDqTUlkmhlFiWQB3K0cudzMHRybdjc963crOaN2aHkjHmSrbsLVREyIN7 ThXX5yQudBZlVeoM2CONO+gVASkaX1vz3UnMwK+S0BFlsY9icQOBd7jaXJPjIRtmVgoV hOVd/KqE2zsE9QyWkDLTGef+iMcmeRXvmXpL/84Dmry2UjCrGIdoJFeFENZjSXnDHrB1 0v7Q== X-Gm-Message-State: AOAM530y5DPD3FLuvdEEwPueKUZw5th7O9IIknem4oyw7RClstVuhfST 0zW1Q0yyEhz5cOrFQ4ICsEiMq80Yz7FJJfQ3IGAZO3nEuXrXGxT6RCmE1AzNfzM+GG67zFJs0Fn PMxR3fXANyx1y/e2KydIIhny+kdJUfCRtUpRtmUfX4qVOCnAqFePMjGTovFFr X-Received: by 2002:a17:902:b716:b0:141:d36c:78fc with SMTP id d22-20020a170902b71600b00141d36c78fcmr32876806pls.59.1636962937764; Sun, 14 Nov 2021 23:55:37 -0800 (PST) X-Google-Smtp-Source: ABdhPJwAd51rC565DmR4S0Px83qtaC3tncJfoqUkji9bNAdVboAcS0spgZmn9DtgCef8nGAYJErnow== X-Received: by 2002:a17:902:b716:b0:141:d36c:78fc with SMTP id d22-20020a170902b71600b00141d36c78fcmr32876737pls.59.1636962937216; Sun, 14 Nov 2021 23:55:37 -0800 (PST) Received: from localhost.localdomain ([191.101.132.223]) by smtp.gmail.com with ESMTPSA id e10sm15792796pfv.140.2021.11.14.23.55.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 14 Nov 2021 23:55:36 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Axel Rasmussen , Nadav Amit , Mike Rapoport , Hugh Dickins , Mike Kravetz , "Kirill A . Shutemov" , Alistair Popple , Jerome Glisse , Matthew Wilcox , Andrew Morton , peterx@redhat.com, David Hildenbrand , Andrea Arcangeli Subject: [PATCH v6 00/23] userfaultfd-wp: Support shmem and hugetlbfs Date: Mon, 15 Nov 2021 15:54:59 +0800 Message-Id: <20211115075522.73795-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9DE43D036A55 X-Stat-Signature: 8kdqjzyjywtt5duxek81rn8o835jq9cr Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="HKi/AxvR"; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf21.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com X-HE-Tag: 1636962942-787888 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is v6 of the series to add shmem+hugetlbfs support for userfaultfd write protection. It is based on v5.16-rc1 (fa55b7dcdc43), with below two patches applied first: Subject: [PATCH RFC 0/2] mm: Rework zap ptes on swap entries https://lore.kernel.org/lkml/20211110082952.19266-1-peterx@redhat.com/ The whole tree can be found here for testing: https://github.com/xzpeter/linux/tree/uffd-wp-shmem-hugetlbfs Previous versions: RFC: https://lore.kernel.org/lkml/20210115170907.24498-1-peterx@redhat.com/ v1: https://lore.kernel.org/lkml/20210323004912.35132-1-peterx@redhat.com/ v2: https://lore.kernel.org/lkml/20210427161317.50682-1-peterx@redhat.com/ v3: https://lore.kernel.org/lkml/20210527201927.29586-1-peterx@redhat.com/ v4: https://lore.kernel.org/lkml/20210714222117.47648-1-peterx@redhat.com/ v5: https://lore.kernel.org/lkml/20210715201422.211004-1-peterx@redhat.com/ Overview ================== This is the first version of this work to rebase the uffd-wp logic work upon PTE markers. The major logic will be the same as v5, but since there're quite a few minor changes here and there, I decided to not provide a change log at all as it'll stop to be helpful. However I should have addressed all the comments that were raised by reviewers, please shoot if I missed something. I still kept many of the Mike's Review-By tag when there's merely no change to the patch content (I touched up quite a few commit messages), but it'll be nice if Mike could still went over the patches even if there're R-bs standing. PTE marker is a new type of swap entry that is ony applicable to file-backed memories like shmem and hugetlbfs. It's used to persist some pte-level information even if the original present ptes in pgtable are zapped. These information could be one of: (1) Userfaultfd wr-protect information (2) PTE soft-dirty information (3) Or others This series only uses the marker to store uffd-wp information across temporary zappings of shmem/hugetlbfs pgtables, for example, when a shmem thp is split. So even if ptes are temporarily zapped, the wr-protect information can still be kept within the pgtables. Then when the page fault triggers again, we'll know this pte is wr-protected so we can treat the pte the same as a normal uffd wr-protected pte. The extra information is encoded into the swap entry, or swp_offset to be explicit, with the swp_type being PTE_MARKER. So far uffd-wp only uses one bit out of the swap entry, the rest bits of swp_offset are still reserved for other purposes. There're two configs to enable/disable PTE markers: CONFIG_PTE_MARKER CONFIG_PTE_MARKER_UFFD_WP We can set !PTE_MARKER to completely disable all the PTE markers, along with uffd-wp support. I made two config so we can also enable PTE marker but disable uffd-wp file-backed for other purposes. At the end of current series, I'll enable CONFIG_PTE_MARKER by default, but that patch is standalone and if anyone worries about having it by default, we can also consider turn it off by dropping that oneliner patch. So far I don't see a huge risk of doing so, so I kept that patch. In most cases, PTE markers should be treated as none ptes. It is because that unlike most of the other swap entry types, there's no PFN or block offset information encoded into PTE markers but some extra well-defined bits showing the status of the pte. These bits should only be used as extra data when servicing an upcoming page fault, and that should be it. I did spend a lot of time observing all the pte_none() users this time. It is indeed a challenge because there're a lot, and I hope I didn't miss a single of them when we should take care of pte markers. Luckily, I don't think it'll need to be considered in many cases, for example: boot code, arch code (especially non-x86), kernel-only page handlings (e.g. CPA), or device driver codes when we're tackling with pure PFN mappings. I introduced pte_none_mostly() in this series when we need to handle pte markers the same as none pte, the "mostly" is the other way to write "either none pte or a pte marker". I didn't replace pte_none() to cover pte markers for below reasons: - Very rare case of pte_none() callers will handle pte markers. E.g., all the kernel pages do not require knowledge of pte markers. So we don't pollute the major use cases. - Unconditionally change pte_none() semantics could confuse people, because pte_none() existed for so long a time. - Unconditionally change pte_none() semantics could make pte_none() slower even if in many cases pte markers do not exist. - There're cases where we'd like to handle pte markers differntly from pte_none(), so a full replace is also impossible. E.g. khugepaged should still treat pte markers as normal swap ptes rather than none ptes, because pte markers will always need a fault-in to merge the marker with a valid pte. Or the smap code will need to parse PTE markers not none ptes. Patch Layout ============ Introducing PTE marker and uffd-wp bit in PTE marker: mm: Introduce PTE_MARKER swap entry mm: Teach core mm about pte markers mm: Check against orig_pte for finish_fault() mm/uffd: PTE_MARKER_UFFD_WP Adding support for shmem uffd-wp: mm/shmem: Take care of UFFDIO_COPY_MODE_WP mm/shmem: Handle uffd-wp special pte in page fault handler mm/shmem: Persist uffd-wp bit across zapping for file-backed mm/shmem: Allow uffd wr-protect none pte for file-backed mem mm/shmem: Allows file-back mem to be uffd wr-protected on thps mm/shmem: Handle uffd-wp during fork() Adding support for hugetlbfs uffd-wp: mm/hugetlb: Introduce huge pte version of uffd-wp helpers mm/hugetlb: Hook page faults for uffd write protection mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP mm/hugetlb: Handle UFFDIO_WRITEPROTECT mm/hugetlb: Handle pte markers in page faults mm/hugetlb: Allow uffd wr-protect none ptes mm/hugetlb: Only drop uffd-wp special pte if required mm/hugetlb: Handle uffd-wp during fork() Misc handling on the rest mm for uffd-wp file-backed: mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Enabling of uffd-wp on file-backed memory: mm/uffd: Enable write protection for shmem & hugetlbfs mm: Enable PTE markers by default selftests/uffd: Enable uffd-wp for shmem/hugetlbfs Tests ============== - x86_64 - Compile tested on: - PTE_MARKER && PTE_MARKER_UFFD_WP, - PTE_MARKER && !PTE_MARKER_UFFD_WP, - !PTE_MARKER - !USERFAULTFD - Kernel userfaultfd selftests for shmem/hugetlb/hugetlb_shared - Umapsort [1,2] test for shmem/hugetlb, with swap on/off - aarch64 - Compile and smoke tested with !PTE_MARKER [1] https://github.com/xzpeter/umap-apps/tree/peter [2] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs Peter Xu (23): mm: Introduce PTE_MARKER swap entry mm: Teach core mm about pte markers mm: Check against orig_pte for finish_fault() mm/uffd: PTE_MARKER_UFFD_WP mm/shmem: Take care of UFFDIO_COPY_MODE_WP mm/shmem: Handle uffd-wp special pte in page fault handler mm/shmem: Persist uffd-wp bit across zapping for file-backed mm/shmem: Allow uffd wr-protect none pte for file-backed mem mm/shmem: Allows file-back mem to be uffd wr-protected on thps mm/shmem: Handle uffd-wp during fork() mm/hugetlb: Introduce huge pte version of uffd-wp helpers mm/hugetlb: Hook page faults for uffd write protection mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP mm/hugetlb: Handle UFFDIO_WRITEPROTECT mm/hugetlb: Handle pte markers in page faults mm/hugetlb: Allow uffd wr-protect none ptes mm/hugetlb: Only drop uffd-wp special pte if required mm/hugetlb: Handle uffd-wp during fork() mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs mm/uffd: Enable write protection for shmem & hugetlbfs mm: Enable PTE markers by default selftests/uffd: Enable uffd-wp for shmem/hugetlbfs arch/s390/include/asm/hugetlb.h | 15 ++ fs/hugetlbfs/inode.c | 15 +- fs/proc/task_mmu.c | 11 ++ fs/userfaultfd.c | 31 +--- include/asm-generic/hugetlb.h | 24 +++ include/linux/hugetlb.h | 27 ++-- include/linux/mm.h | 20 +++ include/linux/mm_inline.h | 45 ++++++ include/linux/shmem_fs.h | 4 +- include/linux/swap.h | 15 +- include/linux/swapops.h | 79 ++++++++++ include/linux/userfaultfd_k.h | 67 +++++++++ include/uapi/linux/userfaultfd.h | 10 +- mm/Kconfig | 16 ++ mm/filemap.c | 5 + mm/hmm.c | 2 +- mm/hugetlb.c | 181 +++++++++++++++++----- mm/khugepaged.c | 14 +- mm/memcontrol.c | 8 +- mm/memory.c | 184 ++++++++++++++++++++--- mm/mincore.c | 3 +- mm/mprotect.c | 76 +++++++++- mm/rmap.c | 8 + mm/shmem.c | 4 +- mm/userfaultfd.c | 61 +++++--- tools/testing/selftests/vm/userfaultfd.c | 4 +- 26 files changed, 798 insertions(+), 131 deletions(-)