From patchwork Tue Mar 14 22:12:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 13175063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EFC4C6FD1F for ; Tue, 14 Mar 2023 22:13:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F40756B007D; Tue, 14 Mar 2023 18:13:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ECA008E0001; Tue, 14 Mar 2023 18:13:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D439A6B0080; Tue, 14 Mar 2023 18:13:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C18016B007D for ; Tue, 14 Mar 2023 18:13:15 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 82D111C3981 for ; Tue, 14 Mar 2023 22:13:15 +0000 (UTC) X-FDA: 80568905550.29.FAD59FF Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf17.hostedemail.com (Postfix) with ESMTP id C20734000A for ; Tue, 14 Mar 2023 22:13:13 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="R5pck/04"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3ePEQZA0KCMUl8pw2l3x533pyrzzrwp.nzxwty58-xxv6lnv.z2r@flex--axelrasmussen.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3ePEQZA0KCMUl8pw2l3x533pyrzzrwp.nzxwty58-xxv6lnv.z2r@flex--axelrasmussen.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678831993; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4pTQknU0yyvhsrgkZGBBdxk3ve6H2jxjb5KB6cR7GsY=; b=s0LZ0QMnsmxJBPdH4Z2nGz4lPVnFkzADpDsaYORY7rErzb117VCZm2pwZvKC3tT25cQPSt SNHvcsLaXq4Q9iPt4PR5Tp1oQCuckFiAbW6+zR3BKV71i+RhFK8dl1IHl9JsUo2KfwKbAd 7X2SPdBVacZAEaS93nGlHBnbuM2BKQM= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="R5pck/04"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3ePEQZA0KCMUl8pw2l3x533pyrzzrwp.nzxwty58-xxv6lnv.z2r@flex--axelrasmussen.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3ePEQZA0KCMUl8pw2l3x533pyrzzrwp.nzxwty58-xxv6lnv.z2r@flex--axelrasmussen.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678831993; a=rsa-sha256; cv=none; b=QF3u1sVNOD/ZVLb0oZEU3N74H3skrAsv7vSiXj4Z/NnnHGzTVc+oX/jvRfBRjsdtj7Ah8L rN3yJyf40ikkuP5WXR1ieTWvY7NAI0jgajL9pqMSrZH6XxPXbQJX3NypH1MgFx/zQ6caJk LlEuKN76MpwhjZsqO2cIfto4LUlpAkk= Received: by mail-yb1-f201.google.com with SMTP id z31-20020a25a122000000b00b38d2b9a2e9so10121358ybh.3 for ; Tue, 14 Mar 2023 15:13:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678831993; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=4pTQknU0yyvhsrgkZGBBdxk3ve6H2jxjb5KB6cR7GsY=; b=R5pck/04mfxPmIVBB9o4uaUAo15/ZfMIMU4zBBeiy60dN3PkzfG+pL2Jk8ShQx/wnT tu1lgw8s51UeJbNP7Jq1JSaDKFmENaAF8z1goiVAAzBIHtPHaXkByRIiol+yJE5/YgnQ hW5qdRTIt9H9xpPdA02kwFo7Rk+Ynt1+wYVf34T5VMzeBtBNZvPPA5O3aC3D91D0RNtm mSJ+v1/gDC92H9jo4qw/x0rtVNP40bPesp90xC2s/8DWp1y16hh7VG0pXJdfZGwPx2Bb 3g6PFtXcDzEpd66sHf07DdrXtn5FXRMWyPg2Jsy3ULZKlqOp9KCEKjeIRXmZa8Bhn/E3 nxaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678831993; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4pTQknU0yyvhsrgkZGBBdxk3ve6H2jxjb5KB6cR7GsY=; b=3yF4M8TnQAaZqHt7wtI8i5kl0SVx6miRwAkPUI4SbhYVvfSrfoau6S2K8fjxabnRDt h9Shtye9c7xjEaG+xyZLsRvCalPlQRsBCIGvHTHOnScgptg94VT5WFE1s/Nvp0FTxZ1i IM1vOCIVKGdv7SL41JTwuKfzyVscTsiTlKEB7u5Q6HXMzYjMJ7gjAiCIBoEhb88gUdeS UcQW5/FSijQ7ScFrXXYVmwcT5nPRBRh8y9m9eoI18JlIrnElJs1Pv1//G8Jjwnlk91tR n1r5XaGPJD23YxePiFvXlu3bEh886oUwPfm3wqVQQmYqZ/BfznrnMG7ZR/GN0DPPYbh0 VuxA== X-Gm-Message-State: AO0yUKX3HRfiRsghAgGKAI9JeQ4tHDOFa62lzt8WaURG8xPrUphAvT7v ihfurIYYey9Iv3RQdsOC/gkARjpWhAjIhBGlbAx4 X-Google-Smtp-Source: AK7set+CfovE4o2SVuha7KE903+E9XdVfmvAi2JH8R4Sr+s1e8IgI0cxifkuT5NnCB1X9zPcFkujhBDANx40mexrV28r X-Received: from axel.svl.corp.google.com ([2620:15c:2d4:203:21ce:bab3:17ec:2276]) (user=axelrasmussen job=sendgmr) by 2002:a81:ac52:0:b0:541:9b2b:8240 with SMTP id z18-20020a81ac52000000b005419b2b8240mr6114708ywj.6.1678831992903; Tue, 14 Mar 2023 15:13:12 -0700 (PDT) Date: Tue, 14 Mar 2023 15:12:50 -0700 In-Reply-To: <20230314221250.682452-1-axelrasmussen@google.com> Mime-Version: 1.0 References: <20230314221250.682452-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230314221250.682452-5-axelrasmussen@google.com> Subject: [PATCH v5 4/4] mm: userfaultfd: add UFFDIO_CONTINUE_MODE_WP to install WP PTEs From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Hugh Dickins , Jan Kara , "Liam R. Howlett" , Matthew Wilcox , Mike Kravetz , Mike Rapoport , Muchun Song , Nadav Amit , Peter Xu , Shuah Khan Cc: James Houghton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Axel Rasmussen X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C20734000A X-Stat-Signature: w31uwaw34e3o4ownuae3piass66jrpse X-HE-Tag: 1678831993-196752 X-HE-Meta: U2FsdGVkX18wsi7+FjICUInkmyqqUdu38X+tVZmVGHO7Du4CByJZw+RCPRfx23/2zx/XGRoxzcR8TfQQjMjELPIb6asm5fBPkjrqau0/hJ9RS0eJY9FasXsEgHuN1ZVj32gIvRKRymQr2qjRZD37BG6YFkJZ2ijRlQRiQ9VmQZcD6HGMl0NG8tzW3EeoFqdk91gFrO+7aYz44BYAyZ2olGr5uJGCAYQ9p2x1r7LtbFMolWofH/IKKxuRjPD6ymnnfvUvRTTrJhsgLZmg2+kIUq9EK+RpvRG/81HYkcKHJLHE/SGxMyHZNy2X5NKHoP1G8ohLfqbQ63TxyESnydx1zYqjkf0qQ7LLcoPUGAwVwdMfR5az9lMdc0rBmmMbmrLRtHgIdnjfcXHrVTxLhu5aUHwQUW5fZ4lrMHXJZqmV3PFRU7xduVVb+7j6Xj8wUsCQyiVDr+eBTzfbRfZIk58d6Ia4hdJdos692IlqvmzA9OcNUsPg3fE/kmfu3EYZpq7Lxibd1DJcvt6sDjRSRyBTPwytcq/LxlRL2dSFxUReurjDbGowPE/J+8vWYCV1xLsl+e1gf3d0u5JRjlKb/Jtjr3S7HO3dC/xoYLGnmxhLwfS5rWQNhWAkSta6abSLw5tQL1UowQ5IENUSMQq0gx8YdHGIlXZW74K8PHLoZ4GqdRXqluVuvigiHm9A2QNCskIpi8m4KAa3hwEJBiUQxfbjJdEVsto7fjxsD6TWryT+R7soKvJKtynYPyR59vRbPuTf1WlJQurhA85mScsVRp+b/QaexrZIgG5E6K/czw7XeNAmw5BnReUaCPW1vn22DNUtJSAQ26z/xGUCeYAb4ACDBWfEFXiNIJzGfdS1QFsp2OJ/B9jR1kDUfX6NZfaJgzmjQeaASDWX7Gdf6A8u4EmDvb97CzW8epXhB0zZe8Q9SKe8WhMEePgXibbcfWNGd46fZi3DM9Wla9MHW2ANebT uGJFYRkD TB8mdig/L98QCs2S5Rf/p9odDHzojH0ZMfY61uvnIc8Dne7WxgB67JrpRzNd28f0Kl2oszD+4q0nyok8YqNVt4xW7pImlVYfL6HjSSDaKZf+am4ZkpVw1gE8DhNnZyTlN5zfQmR0/3IPpP94ZI5Qk7MGLX/wz55V0d6KdfIl4DAjc0BOVGVegKTF/wk4qBaY7ZSKF7gAzTtfKlO0Ks49EwQOWheUgjTJfrb221ULwsQWrrDKzVJvzUKjcBv9ypJ/7sbJ74mT6gy8bIV9gBV/7E3q5g7OP60dKOuH/dJVKfUbSASj9Y0Pr0CqgMx9E5wEwxkCDtNBmIrTXty2kRJC25vijsYfcKCVNNVVNlhj+J5FIJ9EPZ9MR0KBVUqobql20onfc7+DsAzDNv4yn2tJ69mlOeTL5kX3o2iLv2GG0/X4KCdaYSZVvBkl6zQZgm678AdXPCHTaFH2EDtzx6bKpdUFgKTbNi+m6VTb3pRzt/8pXy+xDG+xhYFlz09NeOgtW9ui1Oc2rVFT0Udnj7k81f/e/lRGHqKozUyTE79c/2cjld+400KVxudeyEwRcysB2MtIsAD8uPCbiIY37ip/fUoTi+mvtmoGBf4d9cS+4dY0bJ9w+0AmIN/m+tSx88SVkwHhwrFDfEmMYSwbhq3VE7eAvlp0sxBRrZuCMfh8nI05gSxDlBIZQlSkqIGfPlsXBFPauPAjSUKAqghh5sTFm7jBzAo4PVxrcJdKJlj06hkySQsHcqJ3eMVnm599W0G9z4p1LApAe53+Hg6ApVNBfz8UNLl6y2ZckCzKD327ua/+TE5Lj3pSFnq3tgw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new PTE to resolve a missing fault, one can install a write-protected one. This is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in combination. This was motivated by testing HugeTLB HGM [1], and in particular its interaction with userfaultfd features. Existing userfaultfd code supports using WP and MINOR modes together (i.e. you can register an area with both enabled), but without this CONTINUE flag the combination is in practice unusable. So, add an analogous UFFDIO_CONTINUE_MODE_WP, which does the same thing as UFFDIO_COPY_MODE_WP, but for *minor* faults. Update the selftest to do some very basic exercising of the new flag. Update Documentation/ to describe how these flags are used (neither the COPY nor the new CONTINUE versions of this mode flag were described there before). [1]: https://patchwork.kernel.org/project/linux-mm/cover/20230218002819.1486479-1-jthoughton@google.com/ Acked-by: Peter Xu Acked-by: Mike Rapoport (IBM) Signed-off-by: Axel Rasmussen --- Documentation/admin-guide/mm/userfaultfd.rst | 8 ++++++++ fs/userfaultfd.c | 8 ++++++-- include/linux/userfaultfd_k.h | 3 ++- include/uapi/linux/userfaultfd.h | 7 +++++++ mm/userfaultfd.c | 5 +++-- tools/testing/selftests/mm/userfaultfd.c | 4 ++++ 6 files changed, 30 insertions(+), 5 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 7dc823b56ca4..0ce400f8da93 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -219,6 +219,14 @@ former will have ``UFFD_PAGEFAULT_FLAG_WP`` set, the latter you still need to supply a page when ``UFFDIO_REGISTER_MODE_MISSING`` was used. +When using ``UFFDIO_REGISTER_MODE_WP`` in combination with either +``UFFDIO_REGISTER_MODE_MISSING`` or ``UFFDIO_REGISTER_MODE_MINOR``, when +resolving missing / minor faults with ``UFFDIO_COPY`` or ``UFFDIO_CONTINUE`` +respectively, it may be desirable for the new page / mapping to be +write-protected (so future writes will also result in a WP fault). These ioctls +support a mode flag (``UFFDIO_COPY_MODE_WP`` or ``UFFDIO_CONTINUE_MODE_WP`` +respectively) to configure the mapping this way. + QEMU/KVM ======== diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 56e54e50414e..664019381e04 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1878,6 +1878,7 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) struct uffdio_continue uffdio_continue; struct uffdio_continue __user *user_uffdio_continue; struct userfaultfd_wake_range range; + uffd_flags_t flags = 0; user_uffdio_continue = (struct uffdio_continue __user *)arg; @@ -1902,13 +1903,16 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) uffdio_continue.range.start) { goto out; } - if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE) + if (uffdio_continue.mode & ~(UFFDIO_CONTINUE_MODE_DONTWAKE | + UFFDIO_CONTINUE_MODE_WP)) goto out; + if (uffdio_continue.mode & UFFDIO_CONTINUE_MODE_WP) + flags |= MFILL_ATOMIC_WP; if (mmget_not_zero(ctx->mm)) { ret = mfill_atomic_continue(ctx->mm, uffdio_continue.range.start, uffdio_continue.range.len, - &ctx->mmap_changing); + &ctx->mmap_changing, flags); mmput(ctx->mm); } else { return -ESRCH; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index a948d92154f5..fd6d7d80b6ea 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -83,7 +83,8 @@ extern ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm, unsigned long len, atomic_t *mmap_changing); extern ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long dst_start, - unsigned long len, atomic_t *mmap_changing); + unsigned long len, atomic_t *mmap_changing, + uffd_flags_t flags); extern int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, unsigned long len, bool enable_wp, atomic_t *mmap_changing); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 005e5e306266..14059a0861bf 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -297,6 +297,13 @@ struct uffdio_writeprotect { struct uffdio_continue { struct uffdio_range range; #define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0) + /* + * UFFDIO_CONTINUE_MODE_WP will map the page write protected on + * the fly. UFFDIO_CONTINUE_MODE_WP is available only if the + * write protected ioctl is implemented for the range + * according to the uffdio_register.ioctls. + */ +#define UFFDIO_CONTINUE_MODE_WP ((__u64)1<<1) __u64 mode; /* diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 9202c1fc79ba..048beb5d0edd 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -693,10 +693,11 @@ ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm, unsigned long start, } ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long start, - unsigned long len, atomic_t *mmap_changing) + unsigned long len, atomic_t *mmap_changing, + uffd_flags_t flags) { return mfill_atomic(dst_mm, start, 0, len, mmap_changing, - uffd_flags_set_mode(0, MFILL_ATOMIC_CONTINUE)); + uffd_flags_set_mode(flags, MFILL_ATOMIC_CONTINUE)); } long uffd_wp_range(struct vm_area_struct *dst_vma, diff --git a/tools/testing/selftests/mm/userfaultfd.c b/tools/testing/selftests/mm/userfaultfd.c index 7f22844ed704..41c1f9abc481 100644 --- a/tools/testing/selftests/mm/userfaultfd.c +++ b/tools/testing/selftests/mm/userfaultfd.c @@ -585,6 +585,8 @@ static void continue_range(int ufd, __u64 start, __u64 len) req.range.start = start; req.range.len = len; req.mode = 0; + if (test_uffdio_wp) + req.mode |= UFFDIO_CONTINUE_MODE_WP; if (ioctl(ufd, UFFDIO_CONTINUE, &req)) err("UFFDIO_CONTINUE failed for address 0x%" PRIx64, @@ -1332,6 +1334,8 @@ static int userfaultfd_minor_test(void) uffdio_register.range.start = (unsigned long)area_dst_alias; uffdio_register.range.len = nr_pages * page_size; uffdio_register.mode = UFFDIO_REGISTER_MODE_MINOR; + if (test_uffdio_wp) + uffdio_register.mode |= UFFDIO_REGISTER_MODE_WP; if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) err("register failure");