From patchwork Fri Jan 15 17:08:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12023311 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFD6BC433E6 for ; Fri, 15 Jan 2021 17:09:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7392C238EE for ; Fri, 15 Jan 2021 17:09:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7392C238EE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D99EA8D019B; Fri, 15 Jan 2021 12:09:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D21BF8D016A; Fri, 15 Jan 2021 12:09:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C10B88D019B; Fri, 15 Jan 2021 12:09:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0159.hostedemail.com [216.40.44.159]) by kanga.kvack.org (Postfix) with ESMTP id AA0518D016A for ; Fri, 15 Jan 2021 12:09:16 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6D846841B for ; Fri, 15 Jan 2021 17:09:16 +0000 (UTC) X-FDA: 77708645112.09.snow36_3e0548227531 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id 4F8E3181A4AFE for ; Fri, 15 Jan 2021 17:09:16 +0000 (UTC) X-HE-Tag: snow36_3e0548227531 X-Filterd-Recvd-Size: 14598 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Fri, 15 Jan 2021 17:09:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610730554; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=tcEblbJ+vjVG462bdhE6xkFPN4jSyjnLttuGwVPTk6o=; b=eJ/oHcCgt/tdgZwa0O/AUBWI1+Y26JDSJGk7SLHrctLmn/qnFrqcV6g9udI5iBHuHXpRLq bjDhjH1GAJXgkdBbYA3krexOF6la+cB2ldCB4H89uaHuoKYYbBsHVzE3u6Q3Wadd5AcOEa 9RfmOfoJ866AxaXpnDl8NzO54rsRf/I= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-174-JfI5_GuHMday1q8c3Q_r1Q-1; Fri, 15 Jan 2021 12:09:10 -0500 X-MC-Unique: JfI5_GuHMday1q8c3Q_r1Q-1 Received: by mail-qk1-f199.google.com with SMTP id s66so8604687qkh.10 for ; Fri, 15 Jan 2021 09:09:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=cPqCXNEeMtBIj5xHIV1oWmjdBmRx8arNxD9d1ED2IE8=; b=Av5Y8M6bq+VxTYuGEyo98mhKZdgga5Kc0GGgIWXpjIvPokw12frQ26XsZ4G7vyFZ8b O6yzhvzVEuGUk7eFw8eePVZunL0UwOJpUe4MU8e787hecdoySc+/R29YmXdT7DYHmcRK CQxzC2Pjil2fiIkhWGnponqCmJn5WKQOTgTxUZiu7bRg+lEgd4prve//aDYdCVVrHvhH RsfdcsK5ReqshPXWWIOCRu7sykCdocSpXysY3sGpT4F9zd5oE3eshvroy1cD/O6wPcbg oKRkWh05WHEZvpiDGylmjOxRDeMcuUd0iFCi0ubMIiJxDMe+FrYQMSMVeUB16mgPTQHC Z0kg== X-Gm-Message-State: AOAM531ZEAa7tnvwHRyA9/n+KRV7T2KMLDLzAYWrAGLThUNdsY6IfLK0 Kte0w5h9GaddjIUyd1n572RLoiMOV0Bu5vtaWaWngeVXQ50KUat31WYZNuHt8R1dpcU4eIRNK8t fpJ098UK29hg= X-Received: by 2002:ac8:c8f:: with SMTP id n15mr6980963qti.351.1610730550011; Fri, 15 Jan 2021 09:09:10 -0800 (PST) X-Google-Smtp-Source: ABdhPJzzybQesk3B3da8PXN4/3FxqwTqOECS4zFq+Apan4AUwGlZbSjRapq49qrEsd203bPYzJQz/w== X-Received: by 2002:ac8:c8f:: with SMTP id n15mr6980919qti.351.1610730549542; Fri, 15 Jan 2021 09:09:09 -0800 (PST) Received: from localhost.localdomain ([142.126.83.202]) by smtp.gmail.com with ESMTPSA id d123sm5187840qke.95.2021.01.15.09.09.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Jan 2021 09:09:08 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Rapoport , Mike Kravetz , peterx@redhat.com, Jerome Glisse , "Kirill A . Shutemov" , Hugh Dickins , Axel Rasmussen , Matthew Wilcox , Andrew Morton , Andrea Arcangeli , Nadav Amit Subject: [PATCH RFC 00/30] userfaultfd-wp: Support shmem and hugetlbfs Date: Fri, 15 Jan 2021 12:08:37 -0500 Message-Id: <20210115170907.24498-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is a RFC series to support userfaultfd upon shmem and hugetlbfs. PS. Note that there's a known issue [0] with tlb against uffd-wp/soft-dirty in general and Nadav is working on it. It may or may not directly affect shmem/hugetlbfs since there're no COW on shared mappings normally. Private shmem could hit, but still that's another problem to solve in general, and this RFC is majorly to see whether there's any objection on the concept of the idea specific to uffd-wp on shmem/hugetlbfs. The whole series can also be found online [1]. The major comment I'd like to get is on the new idea of swap special pte. That comes from suggestions from both Hugh and Andrea and I appreciated a lot for those discussions. In short, it's a new type of pte that doesn't exist in the past, while used in file-backed memories to persist information across ptes being erased (but the page cache could still exist, for example, so in the next page fault we can reload the page cache with that specific information when necessary). I'm copy-pasting some commit message from the patch "mm/swap: Introduce the idea of special swap ptes", where uffd-wp becomes the first user of it: We used to have special swap entries, like migration entries, hw-poison entries, device private entries, etc. Those "special swap entries" reside in the range that they need to be at least swap entries first, and their types are decided by swp_type(entry). This patch introduces another idea called "special swap ptes". It's very easy to get confused against "special swap entries", but a speical swap pte should never contain a swap entry at all. It means, it's illegal to call pte_to_swp_entry() upon a special swap pte. Make the uffd-wp special pte to be the first special swap pte. Before this patch, is_swap_pte()==true means one of the below: (a.1) The pte has a normal swap entry (non_swap_entry()==false). For example, when an anonymous page got swapped out. (a.2) The pte has a special swap entry (non_swap_entry()==true). For example, a migration entry, a hw-poison entry, etc. After this patch, is_swap_pte()==true means one of the below, where case (b) is added: (a) The pte contains a swap entry. (a.1) The pte has a normal swap entry (non_swap_entry()==false). For example, when an anonymous page got swapped out. (a.2) The pte has a special swap entry (non_swap_entry()==true). For example, a migration entry, a hw-poison entry, etc. (b) The pte does not contain a swap entry at all (so it cannot be passed into pte_to_swp_entry()). For example, uffd-wp special swap pte. Hugetlbfs needs similar thing because it's also file-backed. I directly reused the same special pte there, though the shmem/hugetlb change on supporting this new pte is different since they don't share code path a lot. Patch layout ============ Part (1): some fixes that I observed when working on this; feel free to skip them for now becuase I think they're corner cases and irrelevant of the major change: mm/thp: Simplify copying of huge zero page pmd when fork mm/userfaultfd: Fix uffd-wp special cases for fork() mm/userfaultfd: Fix a few thp pmd missing uffd-wp bit Part (2): Shmem support, this is where the special swap pte is introduced. Some zap rework is needed within the process: shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP mm: Clear vmf->pte after pte_unmap_same() returns mm/userfaultfd: Introduce special pte for unmapped file-backed mem mm/swap: Introduce the idea of special swap ptes shmem/userfaultfd: Handle uffd-wp special pte in page fault handler mm: Drop first_index/last_index in zap_details mm: Introduce zap_details.zap_flags mm: Introduce ZAP_FLAG_SKIP_SWAP mm: Pass zap_flags into unmap_mapping_pages() shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed shmem/userfaultfd: Allow wr-protect none pte for file-backed mem shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps shmem/userfaultfd: Handle the left-overed special swap ptes shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Part (3): Hugetlb support, we need to disable huge pmd sharing for uffd-wp because not compatible just like uffd minor mode. The rest is the changes required to teach hugetlbfs understand the special swap pte too that introduced with the uffd-wp change: hugetlb/userfaultfd: Hook page faults for uffd write protection hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT hugetlb: Pass vma into huge_pte_alloc() hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled mm/hugetlb: Introduce huge version of special swap pte helpers mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler hugetlb/userfaultfd: Allow wr-protect none ptes hugetlb/userfaultfd: Only drop uffd-wp special pte if required Part (4): Enable both features in code and test userfaultfd: Enable write protection for shmem & hugetlbfs userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Tests ========= I've tested it using either userfaultfd kselftest program, but also with umapsort [2] which should be even stricter. No complicated mm setup is tested yet besides page swapping in/out, but in all cases we need to have more tests when it becomes non-RFC. If anyone would like to try umapsort, need to use an extremely hacked version of umap library [3], because by default umap only supports anonymous. So to test it we need to build [3] then [2]. Any comment would be greatly welcomed. Thanks, [0] https://lore.kernel.org/lkml/20201225092529.3228466-1-namit@vmware.com/ [1] https://github.com/xzpeter/linux/tree/uffd-wp-shmem-hugetlbfs [2] https://github.com/LLNL/umap-apps [3] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs Peter Xu (30): mm/thp: Simplify copying of huge zero page pmd when fork mm/userfaultfd: Fix uffd-wp special cases for fork() mm/userfaultfd: Fix a few thp pmd missing uffd-wp bit shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP mm: Clear vmf->pte after pte_unmap_same() returns mm/userfaultfd: Introduce special pte for unmapped file-backed mem mm/swap: Introduce the idea of special swap ptes shmem/userfaultfd: Handle uffd-wp special pte in page fault handler mm: Drop first_index/last_index in zap_details mm: Introduce zap_details.zap_flags mm: Introduce ZAP_FLAG_SKIP_SWAP mm: Pass zap_flags into unmap_mapping_pages() shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed shmem/userfaultfd: Allow wr-protect none pte for file-backed mem shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps shmem/userfaultfd: Handle the left-overed special swap ptes shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() hugetlb/userfaultfd: Hook page faults for uffd write protection hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT hugetlb: Pass vma into huge_pte_alloc() hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled mm/hugetlb: Introduce huge version of special swap pte helpers mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler hugetlb/userfaultfd: Allow wr-protect none ptes hugetlb/userfaultfd: Only drop uffd-wp special pte if required userfaultfd: Enable write protection for shmem & hugetlbfs userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs arch/arm64/mm/hugetlbpage.c | 5 +- arch/ia64/mm/hugetlbpage.c | 3 +- arch/mips/mm/hugetlbpage.c | 4 +- arch/parisc/mm/hugetlbpage.c | 2 +- arch/powerpc/mm/hugetlbpage.c | 3 +- arch/s390/mm/hugetlbpage.c | 2 +- arch/sh/mm/hugetlbpage.c | 2 +- arch/sparc/mm/hugetlbpage.c | 2 +- arch/x86/include/asm/pgtable.h | 28 +++ fs/dax.c | 10 +- fs/hugetlbfs/inode.c | 15 +- fs/proc/task_mmu.c | 14 +- fs/userfaultfd.c | 80 +++++-- include/asm-generic/hugetlb.h | 10 + include/asm-generic/pgtable_uffd.h | 3 + include/linux/huge_mm.h | 3 +- include/linux/hugetlb.h | 47 +++- include/linux/mm.h | 50 +++- include/linux/mm_inline.h | 43 ++++ include/linux/mmu_notifier.h | 1 + include/linux/shmem_fs.h | 5 +- include/linux/swapops.h | 41 +++- include/linux/userfaultfd_k.h | 37 +++ include/uapi/linux/userfaultfd.h | 3 +- mm/huge_memory.c | 36 ++- mm/hugetlb.c | 174 +++++++++++--- mm/khugepaged.c | 14 +- mm/memcontrol.c | 2 +- mm/memory.c | 277 ++++++++++++++++++----- mm/migrate.c | 2 +- mm/mprotect.c | 63 +++++- mm/mremap.c | 2 +- mm/page_vma_mapped.c | 6 +- mm/rmap.c | 8 + mm/shmem.c | 39 +++- mm/truncate.c | 17 +- mm/userfaultfd.c | 37 +-- tools/testing/selftests/vm/userfaultfd.c | 14 +- 38 files changed, 881 insertions(+), 223 deletions(-)