From patchwork Mon Jul 18 11:47:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12921634 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCCD8C433EF for ; Mon, 18 Jul 2022 19:22:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 582EE6B0074; Mon, 18 Jul 2022 15:22:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E3608E0001; Mon, 18 Jul 2022 15:22:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E71B6B0078; Mon, 18 Jul 2022 15:22:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1F6136B0074 for ; Mon, 18 Jul 2022 15:22:42 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E1B9B207C8 for ; Mon, 18 Jul 2022 19:22:41 +0000 (UTC) X-FDA: 79701192522.25.3E7F711 Received: from relay3.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by imf18.hostedemail.com (Postfix) with ESMTP id 6F8761C007E for ; Mon, 18 Jul 2022 19:22:41 +0000 (UTC) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2888E208B0 for ; Mon, 18 Jul 2022 19:22:41 +0000 (UTC) X-FDA: 79701192522.28.D59C628 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf12.hostedemail.com (Postfix) with ESMTP id B50A240086 for ; Mon, 18 Jul 2022 19:22:40 +0000 (UTC) Received: by mail-pl1-f181.google.com with SMTP id 5so9924508plk.9 for ; Mon, 18 Jul 2022 12:22:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9JnFtO1B98JvBHwzxYS/9yFVfDMFi9WXo7p+265wzs4=; b=CiY4d6Ms1IdgIR0MPBSFFR0IAYGl6NHf1DWuXKYfE5/RzgsP7qdnaAKBTVAHZTx9NW auOpI1ORXWjaspNpWnXLrQB4AjRTnYqCp6S/rOqCnBGRANchIgWU9rw1AcTaAesLlMs5 lVSltvp/6DUSLak8UtWHpVH+E7/wiwp8S0zBUOJ5c66Hfwt1zNOA2i/a/BXuHDPdNXEe yrAzNGSUHusD265TOzFzbNkaO9DnrLS1t2KUxtL2yKBlwB58mA+cGP5+thURTsclrYU0 V0GSAEiREDI+LF94z7e6dJt+tLoseXy1FeDQcn/BRt931N7WhYcdVxV8eGMIBFWrw/ds iIoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9JnFtO1B98JvBHwzxYS/9yFVfDMFi9WXo7p+265wzs4=; b=EiyA4CtRMEpio6FgKgujVMW2yXfDaLDN/Plz7Fx55gWgyJ9GEutmnOtydi+82ITyrq cRJj6FcKnJlkWrA4Cjigc9tQKba1X1rsY5xRvhS61JnlFZwkr0vVQgfGsZ3FbjGRFx8I ZQA+u/5YocvLf/pJva455ZmyG20Gc+VB2a5/S5q00nEwtgY4tM7yCpVPgxXferHXZeLj eOLF3kEUqSdiBZBsGnxIRXpouECryl5qfJN37uHQlsI9f+A54bzXQNdmDKTxzsz8bM08 2jVXSu6f+AQgILEYTZWxfpyNw5eojk4dOq8TDfOQ7N8n/rj3/KEYzV1A7UOE73uWK3oW drsQ== X-Gm-Message-State: AJIora9pSqZ3WMs+XQNY9kK5ZJGwweL1s2QUrOwV2V1QjTCNDyp+tjc5 U++xDJ26zg3AuvZWDSofDIYnxq7/w1M= X-Google-Smtp-Source: AGRyM1vy5jW3YbDHpO3/btFga+TPOadwDARgmiX5HyZerVF4vZ73mef2kK9Tdtud/9nVygaSnuDlug== X-Received: by 2002:a17:90a:f001:b0:1f0:3285:6b5b with SMTP id bt1-20020a17090af00100b001f032856b5bmr34995317pjb.12.1658172159315; Mon, 18 Jul 2022 12:22:39 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id g9-20020a625209000000b0051bc5f4df1csm9613570pfb.154.2022.07.18.12.22.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jul 2022 12:22:38 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Andrew Morton , Nadav Amit , Mike Kravetz , Hugh Dickins , Axel Rasmussen , Peter Xu , David Hildenbrand , Mike Rapoport Subject: [PATCH v2 3/5] userfaultfd: introduce write-likely mode for uffd operations Date: Mon, 18 Jul 2022 04:47:46 -0700 Message-Id: <20220718114748.2623-4-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220718114748.2623-1-namit@vmware.com> References: <20220718114748.2623-1-namit@vmware.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658172161; a=rsa-sha256; cv=none; b=NJETpBUINWfFhUnrZQyrSrKpSxPzdViFj/FjbOrV3xXnuRbifs+QTK9hE0ZngZ/yYMOQjm QcUiqoxRFysW2BXT4bnEzy8tNW88SY1cpeLPmHKwrZWdE8TpDrG8Ioo521/iFNj+exPJ55 8S9QbCpGDb8aEaKZtF9oSrEfl01T4E4= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=CiY4d6Ms; spf=none (imf18.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.15) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658172161; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9JnFtO1B98JvBHwzxYS/9yFVfDMFi9WXo7p+265wzs4=; b=YwKlT/zyhqRJrlG1zdF88M0egbLRR6qQ2wLlG9yblyn92vQtsM7iOAYqTXZLMi/bF/uCho TIEMVS5Rmf0VaEPT0fDZETSVAnlNqouPDPUk1UtXz/sMV5oOiTN3xM0lk1fO9PkELlzJE/ 3p8HzHdy1qAyafrwKjFfxC/UN5yRD2M= X-Stat-Signature: pmunf9pb8htsckh7n8owi3tuekyce4br X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6F8761C007E X-Rspam-User: X-HE-Tag-Orig: 1658172160-386368 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=CiY4d6Ms; spf=none (imf18.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.15) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1658172161-131042 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Introduce write-likely hints for uffd. These hints would be used in a future patch to decide whether to attempt to map pages in the page-table or whether to only mark them logically as writable. This allows userspace to determine whether a page would be accessed faster or whether removal of the page would be possible, potentially, without writeback and TLB flush. Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 32 ++++++++++++++++++++++++-------- include/linux/userfaultfd_k.h | 1 + include/uapi/linux/userfaultfd.h | 13 ++++++++++++- 3 files changed, 37 insertions(+), 9 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 8d8792b27c53..3027d228550a 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1709,7 +1709,8 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, if (uffdio_copy.src + uffdio_copy.len <= uffdio_copy.src) goto out; if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP| - UFFDIO_COPY_MODE_ACCESS_LIKELY)) + UFFDIO_COPY_MODE_ACCESS_LIKELY| + UFFDIO_COPY_MODE_WRITE_LIKELY)) goto out; mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP; @@ -1719,8 +1720,11 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, if (ctx->features & UFFD_FEATURE_ACCESS_HINTS) { if (uffdio_copy.mode & UFFDIO_COPY_MODE_ACCESS_LIKELY) uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (uffdio_copy.mode & UFFDIO_COPY_MODE_WRITE_LIKELY) + uffd_flags |= UFFD_FLAGS_WRITE_LIKELY; } else { - uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY | + UFFD_FLAGS_WRITE_LIKELY; } if (mmget_not_zero(ctx->mm)) { @@ -1774,14 +1778,18 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, goto out; ret = -EINVAL; if (uffdio_zeropage.mode & ~(UFFDIO_ZEROPAGE_MODE_DONTWAKE| - UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY)) + UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY| + UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY)) goto out; if (ctx->features & UFFD_FEATURE_ACCESS_HINTS) { if (uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY) uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY) + uffd_flags |= UFFD_FLAGS_WRITE_LIKELY; } else { - uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY | + UFFD_FLAGS_WRITE_LIKELY; } if (mmget_not_zero(ctx->mm)) { @@ -1834,7 +1842,8 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, if (uffdio_wp.mode & ~(UFFDIO_WRITEPROTECT_MODE_DONTWAKE | UFFDIO_WRITEPROTECT_MODE_WP | - UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY)) + UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY | + UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY)) return -EINVAL; mode_wp = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP; @@ -1847,8 +1856,11 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, if (ctx->features & UFFD_FEATURE_ACCESS_HINTS) { if (uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY) uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY) + uffd_flags |= UFFD_FLAGS_WRITE_LIKELY; } else { - uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY | + UFFD_FLAGS_WRITE_LIKELY; } if (mmget_not_zero(ctx->mm)) { @@ -1903,14 +1915,18 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) goto out; } if (uffdio_continue.mode & ~(UFFDIO_CONTINUE_MODE_DONTWAKE| - UFFDIO_CONTINUE_MODE_ACCESS_LIKELY)) + UFFDIO_CONTINUE_MODE_ACCESS_LIKELY| + UFFDIO_CONTINUE_MODE_WRITE_LIKELY)) goto out; if (ctx->features & UFFD_FEATURE_ACCESS_HINTS) { if (uffdio_continue.mode & UFFDIO_CONTINUE_MODE_ACCESS_LIKELY) uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (uffdio_continue.mode & UFFDIO_CONTINUE_MODE_WRITE_LIKELY) + uffd_flags |= UFFD_FLAGS_WRITE_LIKELY; } else { - uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY | + UFFD_FLAGS_WRITE_LIKELY; } if (mmget_not_zero(ctx->mm)) { diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index b326798b5677..4968c86938b2 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -60,6 +60,7 @@ typedef unsigned int __bitwise uffd_flags_t; #define UFFD_FLAGS_NONE ((__force uffd_flags_t)0) #define UFFD_FLAGS_WP ((__force uffd_flags_t)BIT(0)) #define UFFD_FLAGS_ACCESS_LIKELY ((__force uffd_flags_t)BIT(1)) +#define UFFD_FLAGS_WRITE_LIKELY ((__force uffd_flags_t)BIT(2)) extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 02e0c1f56939..f52cbe4c9c44 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -202,7 +202,7 @@ struct uffdio_api { * write-protection mode is supported on both shmem and hugetlbfs. * * UFFD_FEATURE_ACCESS_HINTS indicates that the ioctl operations - * support the UFFDIO_*_MODE_ACCESS_LIKELY hints. + * support the UFFDIO_*_MODE_[ACCESS|WRITE]_LIKELY hints. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -257,9 +257,13 @@ struct uffdio_copy { * page is likely to be access in the near future. Providing the hint * properly can improve performance. * + * UFFDIO_COPY_MODE_WRITE_LIKELY provides a hint to the kernel that the + * page is likely to be written in the near future. Providing the hint + * properly can improve performance. */ #define UFFDIO_COPY_MODE_WP ((__u64)1<<1) #define UFFDIO_COPY_MODE_ACCESS_LIKELY ((__u64)1<<2) +#define UFFDIO_COPY_MODE_WRITE_LIKELY ((__u64)1<<3) __u64 mode; /* @@ -273,6 +277,7 @@ struct uffdio_zeropage { struct uffdio_range range; #define UFFDIO_ZEROPAGE_MODE_DONTWAKE ((__u64)1<<0) #define UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY ((__u64)1<<1) +#define UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY ((__u64)1<<2) __u64 mode; /* @@ -296,6 +301,10 @@ struct uffdio_writeprotect { * that the page is likely to be access in the near future. Providing * the hint properly can improve performance. * + * UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY: provides a hint to the kernel + * that the page is likely to be written in the near future. Providing + * the hint properly can improve performance. + * * NOTE: Write protecting a region (WP=1) is unrelated to page faults, * therefore DONTWAKE flag is meaningless with WP=1. Removing write * protection (WP=0) in response to a page fault wakes the faulting @@ -304,6 +313,7 @@ struct uffdio_writeprotect { #define UFFDIO_WRITEPROTECT_MODE_WP ((__u64)1<<0) #define UFFDIO_WRITEPROTECT_MODE_DONTWAKE ((__u64)1<<1) #define UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY ((__u64)1<<2) +#define UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY ((__u64)1<<3) __u64 mode; }; @@ -311,6 +321,7 @@ struct uffdio_continue { struct uffdio_range range; #define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0) #define UFFDIO_CONTINUE_MODE_ACCESS_LIKELY ((__u64)1<<1) +#define UFFDIO_CONTINUE_MODE_WRITE_LIKELY ((__u64)1<<2) __u64 mode; /*