From patchwork Wed Nov 16 10:26:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13044673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BBD7C433FE for ; Wed, 16 Nov 2022 10:27:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B47616B0071; Wed, 16 Nov 2022 05:27:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ACEEE8E0002; Wed, 16 Nov 2022 05:27:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FA548E0001; Wed, 16 Nov 2022 05:27:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7E1CF6B0071 for ; Wed, 16 Nov 2022 05:27:31 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 56C521C507A for ; Wed, 16 Nov 2022 10:27:31 +0000 (UTC) X-FDA: 80138928702.06.655B291 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf17.hostedemail.com (Postfix) with ESMTP id 530A740009 for ; Wed, 16 Nov 2022 10:27:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668594449; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=/1/81f1bc3wgoHEbP5M1Zz0wSDu9jba91U8WGSjmQ70=; b=P7FQeh3Abt0OM4TIXm/n0nBQJzUQhSxfBWgy+WoKGk8Fa6siWk/cXganc3h/GvqCdCdtpY Xc/mKcLUyt4BJ+JhIPefeDH18U2vWg0q/5Xfy9d9UxheBZoSrg4N8a4iY1BPBBfTsBPES5 zBeakCB1gDDGktVMZn7GGm84BjnquYE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-247-_PUeKXqLO9GD6bVT-vux3w-1; Wed, 16 Nov 2022 05:27:26 -0500 X-MC-Unique: _PUeKXqLO9GD6bVT-vux3w-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 71F03101A528; Wed, 16 Nov 2022 10:27:23 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.216]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6687D2024CCA; Wed, 16 Nov 2022 10:27:02 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, etnaviv@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-samsung-soc@vger.kernel.org, linux-rdma@vger.kernel.org, linux-media@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-perf-users@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kselftest@vger.kernel.org, Linus Torvalds , Andrew Morton , Jason Gunthorpe , John Hubbard , Peter Xu , Greg Kroah-Hartman , Andrea Arcangeli , Hugh Dickins , Nadav Amit , Vlastimil Babka , Matthew Wilcox , Mike Kravetz , Muchun Song , Shuah Khan , Lucas Stach , David Airlie , Oded Gabbay , Arnd Bergmann , Christoph Hellwig , Alex Williamson , David Hildenbrand , Alexander Shishkin , Alexander Viro , Andy Walls , Anton Ivanov , Arnaldo Carvalho de Melo , Bernard Metzler , Borislav Petkov , Catalin Marinas , Christian Benvenuti , Christian Gmeiner , Christophe Leroy , Daniel Vetter , Daniel Vetter , Dave Hansen , "David S. Miller" , Dennis Dalessandro , Eric Biederman , Hans Verkuil , "H. Peter Anvin" , Ingo Molnar , Inki Dae , Ivan Kokshaysky , James Morris , Jiri Olsa , Johannes Berg , Kees Cook , Kentaro Takeda , Krzysztof Kozlowski , Kyungmin Park , Leon Romanovsky , Leon Romanovsky , Marek Szyprowski , Mark Rutland , Matt Turner , Mauro Carvalho Chehab , Michael Ellerman , Namhyung Kim , Nelson Escobar , Nicholas Piggin , Oleg Nesterov , Paul Moore , Peter Zijlstra , Richard Henderson , Richard Weinberger , Russell King , "Serge E. Hallyn" , Seung-Woo Kim , Tetsuo Handa , Thomas Bogendoerfer , Thomas Gleixner , Tomasz Figa , Will Deacon Subject: [PATCH mm-unstable v1 00/20] mm/gup: remove FOLL_FORCE usage from drivers (reliable R/O long-term pinning) Date: Wed, 16 Nov 2022 11:26:39 +0100 Message-Id: <20221116102659.70287-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668594451; a=rsa-sha256; cv=none; b=6gR/1U5nRBTNvuKiU8ayzZdlu6AUvsHekUMVoVhIlleziufaUQmdq6FNTOtl4FwMUWRn3v X7TubZRs5ueoMUjcuPafPDFARPQ/id+MY4b/ZHAMTu+xOX5E/P+f6y4KhzDEVGsHQSSf24 565qdODbj/m9eyA/ftZQDHzMKLHdmRc= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=P7FQeh3A; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf17.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668594451; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=/1/81f1bc3wgoHEbP5M1Zz0wSDu9jba91U8WGSjmQ70=; b=bhiPbWfElEi6MYy7d662hcIsIu8tn3KDbJbfNCBHYlbUxvFeYVp/LFVG9veqz0t/Lsv139 Ryi0V/XUYj3GEBtXW5xmG+dWg5hNS+Mfqs4Dmo7OP1kaJxcmMUocJ0D0lVdRPsbkKxDNKn s3OcO5jnDFtFo10XwbnU7OgH61VydRA= Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=P7FQeh3A; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf17.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com X-Rspam-User: X-Stat-Signature: tot33hteondcz1k7n4cmutyd4r47gigi X-Rspamd-Queue-Id: 530A740009 X-Rspamd-Server: rspam11 X-HE-Tag: 1668594450-228929 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For now, we did not support reliable R/O long-term pinning in COW mappings. That means, if we would trigger R/O long-term pinning in MAP_PRIVATE mapping, we could end up pinning the (R/O-mapped) shared zeropage or a pagecache page. The next write access would trigger a write fault and replace the pinned page by an exclusive anonymous page in the process page table; whatever the process would write to that private page copy would not be visible by the owner of the previous page pin: for example, RDMA could read stale data. The end result is essentially an unexpected and hard-to-debug memory corruption. Some drivers tried working around that limitation by using "FOLL_FORCE|FOLL_WRITE|FOLL_LONGTERM" for R/O long-term pinning for now. FOLL_WRITE would trigger a write fault, if required, and break COW before pinning the page. FOLL_FORCE is required because the VMA might lack write permissions, and drivers wanted to make that working as well, just like one would expect (no write access, but still triggering a write access to break COW). However, that is not a practical solution, because (1) Drivers that don't stick to that undocumented and debatable pattern would still run into that issue. For example, VFIO only uses FOLL_LONGTERM for R/O long-term pinning. (2) Using FOLL_WRITE just to work around a COW mapping + page pinning limitation is unintuitive. FOLL_WRITE would, for example, mark the page softdirty or trigger uffd-wp, even though, there actually isn't going to be any write access. (3) The purpose of FOLL_FORCE is debug access, not access without lack of VMA permissions by arbitrarty drivers. So instead, make R/O long-term pinning work as expected, by breaking COW in a COW mapping early, such that we can remove any FOLL_FORCE usage from drivers and make FOLL_FORCE ptrace-specific (renaming it to FOLL_PTRACE). More details in patch #8. Patches #1--#3 add COW tests for non-anonymous pages. Patches #4--#7 prepare core MM for extended FAULT_FLAG_UNSHARE support in COW mappings. Patch #8 implements reliable R/O long-term pinning in COW mappings Patches #9--#19 remove any FOLL_FORCE usage from drivers. Patch #20 renames FOLL_FORCE to FOLL_PTRACE. I'm refraining from CCing all driver/arch maintainers on the whole patch set, but only CC them on the cover letter and the applicable patch (I know, I know, someone is always unhappy ... sorry). RFC -> v1: * Use term "ptrace" instead of "debuggers" in patch descriptions * Added ACK/Tested-by * "mm/frame-vector: remove FOLL_FORCE usage" -> Adjust description * "mm: rename FOLL_FORCE to FOLL_PTRACE" -> Added David Hildenbrand (20): selftests/vm: anon_cow: prepare for non-anonymous COW tests selftests/vm: cow: basic COW tests for non-anonymous pages selftests/vm: cow: R/O long-term pinning reliability tests for non-anon pages mm: add early FAULT_FLAG_UNSHARE consistency checks mm: add early FAULT_FLAG_WRITE consistency checks mm: rework handling in do_wp_page() based on private vs. shared mappings mm: don't call vm_ops->huge_fault() in wp_huge_pmd()/wp_huge_pud() for private mappings mm: extend FAULT_FLAG_UNSHARE support to anything in a COW mapping mm/gup: reliable R/O long-term pinning in COW mappings RDMA/umem: remove FOLL_FORCE usage RDMA/usnic: remove FOLL_FORCE usage RDMA/siw: remove FOLL_FORCE usage media: videobuf-dma-sg: remove FOLL_FORCE usage drm/etnaviv: remove FOLL_FORCE usage media: pci/ivtv: remove FOLL_FORCE usage mm/frame-vector: remove FOLL_FORCE usage drm/exynos: remove FOLL_FORCE usage RDMA/hw/qib/qib_user_pages: remove FOLL_FORCE usage habanalabs: remove FOLL_FORCE usage mm: rename FOLL_FORCE to FOLL_PTRACE arch/alpha/kernel/ptrace.c | 6 +- arch/arm64/kernel/mte.c | 2 +- arch/ia64/kernel/ptrace.c | 10 +- arch/mips/kernel/ptrace32.c | 4 +- arch/mips/math-emu/dsemul.c | 2 +- arch/powerpc/kernel/ptrace/ptrace32.c | 4 +- arch/sparc/kernel/ptrace_32.c | 4 +- arch/sparc/kernel/ptrace_64.c | 8 +- arch/x86/kernel/step.c | 2 +- arch/x86/um/ptrace_32.c | 2 +- arch/x86/um/ptrace_64.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_gem.c | 8 +- drivers/gpu/drm/exynos/exynos_drm_g2d.c | 2 +- drivers/infiniband/core/umem.c | 8 +- drivers/infiniband/hw/qib/qib_user_pages.c | 2 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 9 +- drivers/infiniband/sw/siw/siw_mem.c | 9 +- drivers/media/common/videobuf2/frame_vector.c | 2 +- drivers/media/pci/ivtv/ivtv-udma.c | 2 +- drivers/media/pci/ivtv/ivtv-yuv.c | 5 +- drivers/media/v4l2-core/videobuf-dma-sg.c | 14 +- drivers/misc/habanalabs/common/memory.c | 3 +- fs/exec.c | 2 +- fs/proc/base.c | 2 +- include/linux/mm.h | 35 +- include/linux/mm_types.h | 8 +- kernel/events/uprobes.c | 4 +- kernel/ptrace.c | 12 +- mm/gup.c | 38 +- mm/huge_memory.c | 13 +- mm/hugetlb.c | 14 +- mm/memory.c | 97 +++-- mm/util.c | 4 +- security/tomoyo/domain.c | 2 +- tools/testing/selftests/vm/.gitignore | 2 +- tools/testing/selftests/vm/Makefile | 10 +- tools/testing/selftests/vm/check_config.sh | 4 +- .../selftests/vm/{anon_cow.c => cow.c} | 387 +++++++++++++++++- tools/testing/selftests/vm/run_vmtests.sh | 8 +- 39 files changed, 575 insertions(+), 177 deletions(-) rename tools/testing/selftests/vm/{anon_cow.c => cow.c} (75%)