From patchwork Fri Jan 15 19:04:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12023857 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E5E5C433DB for ; Fri, 15 Jan 2021 19:05:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D8DBD23A5E for ; Fri, 15 Jan 2021 19:05:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D8DBD23A5E Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B77F08D01E5; Fri, 15 Jan 2021 14:05:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B4E9B8D01E2; Fri, 15 Jan 2021 14:05:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C7CB8D01E5; Fri, 15 Jan 2021 14:05:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0205.hostedemail.com [216.40.44.205]) by kanga.kvack.org (Postfix) with ESMTP id 84B098D01E2 for ; Fri, 15 Jan 2021 14:05:24 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3F6DE8245693 for ; Fri, 15 Jan 2021 19:05:24 +0000 (UTC) X-FDA: 77708937768.04.bells24_4a09f0127531 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id 1F486808C742 for ; Fri, 15 Jan 2021 19:05:24 +0000 (UTC) X-HE-Tag: bells24_4a09f0127531 X-Filterd-Recvd-Size: 11701 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Fri, 15 Jan 2021 19:05:23 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id m203so7705701ybf.1 for ; Fri, 15 Jan 2021 11:05:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=HTAArImNkwD9BtpVp+jD6fuyt/CnvkfNk1WmLhT/Bgs=; b=cGBm+meFJiTcjB3ZKBuLwnkGFPHk4aq5G9oBtWjqb+cKT7kEw2YKvPJDbuJLT5Q3Ln 02eQxQbr6cycGtA8d9Eu+yjQAXItfvkuT4oPSRQi3/lxNt9g83fh0ryQm0ILnhvBAVYh +OuBV2Bszzu14LwCCgORYkDovUC574RNBMyFs1jph4doTpcFlgntB89Yjvgo4oh0ev/+ 77OaG+s0mzjjrLs6Z31stMYlqSo9POjQZnZsODsqObWgNDxWf0+3uTdPejDx1Q9up1VR ee8eCR9+c3zSb3Al3nRwJ804CFfOa6l/Umllc39E+u9FIxUHE1RrEw8RqdOptxKm3CkS nowA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=HTAArImNkwD9BtpVp+jD6fuyt/CnvkfNk1WmLhT/Bgs=; b=aU4UsYcJWQvaZqm4m1fbL0ISZq82S9kdo1JZOMYdn7A2OkW36FyNEmErFSfRr1ytp/ 9LEig3hW1OdvsXk/yBzRefOTJwdLmS/95G6g6eXiIDan8FawdTKaQbl/SoEyOGz/ovMv fPJofTIVdAhuQzakp1NPDDDUqiRyzHwxuRL17pzufbyT+ilLs2SPMF8e4CIdkE5OUp0R TORkh15zB6kLeo3zibphd+7wcBi+UI+sgBMWJcRf1rL9+tBPtMViEupfcwPtrgWPvoOE Idd0TZYXCRBgWS3LQCyA9Rfx8cDXpyAKhWCYckl3od7dfCRs0fAYvnyzM92rVREeGFDx iLvw== X-Gm-Message-State: AOAM531PFdV+kLp8ZVXFj8CsPdM0NIRClS8DoEvegLAqkl/K40cAS5Kd vo3hqKnwZXyaD9hncWHjUz/Dz68cLKnndsacRRwM X-Google-Smtp-Source: ABdhPJyuqSYDrEfgWsiBKXA2ngAi+goBJ+57bkNEPkraaN5cZoBkrpu/FCal/Z9nC47gINGdpFohGMQhkWvOmtf3sln5 X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:f693:9fff:feef:c8f8]) (user=axelrasmussen job=sendgmr) by 2002:a25:3587:: with SMTP id c129mr20723567yba.223.1610737522802; Fri, 15 Jan 2021 11:05:22 -0800 (PST) Date: Fri, 15 Jan 2021 11:04:50 -0800 In-Reply-To: <20210115190451.3135416-1-axelrasmussen@google.com> Message-Id: <20210115190451.3135416-9-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210115190451.3135416-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.284.gd98b1dd5eaa7-goog Subject: [PATCH 8/9] userfaultfd: update documentation to describe minor fault handling From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Reword / reorganize things a little bit into "lists", so new features / modes / ioctls can sort of just be appended. Describe how UFFDIO_REGISTER_MODE_MINOR and UFFDIO_CONTINUE can be used to intercept and resolve minor faults. Make it clear that COPY and ZEROPAGE are used for MISSING faults, whereas CONTINUE is used for MINOR faults. Signed-off-by: Axel Rasmussen --- Documentation/admin-guide/mm/userfaultfd.rst | 105 +++++++++++-------- 1 file changed, 64 insertions(+), 41 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 65eefa66c0ba..67f2c68e65a2 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -63,36 +63,36 @@ the generic ioctl available. The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl defines what memory types are supported by the ``userfaultfd`` and what -events, except page fault notifications, may be generated. - -If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs -virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in -``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be -set if the kernel supports registering ``userfaultfd`` ranges on shared -memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, -``MAP_SHARED``, ``memfd_create``, etc). - -The userland application that wants to use ``userfaultfd`` with hugetlbfs -or shared memory need to set the corresponding flag in -``uffdio_api.features`` to enable those features. - -If the userland desires to receive notifications for events other than -page faults, it has to verify that ``uffdio_api.features`` has appropriate -``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more -detail below in `Non-cooperative userfaultfd`_ section. - -Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should -be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to -register a memory range in the ``userfaultfd`` by setting the +events, except page fault notifications, may be generated: + +- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events + other than page faults are supported. These events are described in more + detail below in the `Non-cooperative userfaultfd`_ section. + +- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM`` + indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING`` + registrations for hugetlbfs and shared memory (covering all shmem APIs, + i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``, + etc) virtual memory areas, respectively. + +- ``UFFD_FEATURE_MINOR_FAULT_HUGETLBFS`` indicates that the kernel + supports ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs + virtual memory areas. + +The userland application should set the feature flags it intends to use +when envoking the ``UFFDIO_API`` ioctl, to request that those features be +enabled if supported. + +Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER`` +ioctl should be invoked (if present in the returned ``uffdio_api.ioctls`` +bitmask) to register a memory range in the ``userfaultfd`` by setting the uffdio_register structure accordingly. The ``uffdio_register.mode`` bitmask will specify to the kernel which kind of faults to track for -the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing -pages). The ``UFFDIO_REGISTER`` ioctl will return the +the range. The ``UFFDIO_REGISTER`` ioctl will return the ``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve userfaults on the range registered. Not all ioctls will necessarily be -supported for all memory types depending on the underlying virtual -memory backend (anonymous memory vs tmpfs vs real filebacked -mappings). +supported for all memory types (e.g. anonymous memory vs. shmem vs. +hugetlbfs), or all types of intercepted faults. Userland can use the ``uffdio_register.ioctls`` to manage the virtual address space in the background (to add or potentially also remove @@ -100,21 +100,44 @@ memory from the ``userfaultfd`` registered range). This means a userfault could be triggering just before userland maps in the background the user-faulted page. -The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That -atomically copies a page into the userfault registered range and wakes -up the blocked userfaults -(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set). -Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in -guaranteeing that nothing can see an half copied page since it'll -keep userfaulting until the copy has finished. +Resolving Userfaults +-------------------- + +There are three basic ways to resolve userfaults: + +- ``UFFDIO_COPY`` atomically copies some existing page contents from + userspace. + +- ``UFFDIO_ZEROPAGE`` atomically zeros the new page. + +- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page. + +These operations are atomic in the sense that they guarantee nothing can +see a half-populated page, since readers will keep userfaulting until the +operation has finished. + +By default, these wake up userfaults blocked on the range in question. +They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates +that waking will be done separately at some later time. + +Which of these are used depends on the kind of fault: + +- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, a new page has to be + provided. This can be done with either ``UFFDIO_COPY`` or + ``UFFDIO_ZEROPAGE``. The default (non-userfaultfd) behavior would be to + provide a zero page, but in userfaultfd this is left up to userspace. + +- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, an existing page already + exists. Userspace needs to ensure its contents are correct (if it needs + to be modified, by writing directly to the non-userfaultfd-registered + side of shared memory), and then issue ``UFFDIO_CONTINUE`` to resolve + the fault. Notes: -- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then - you must provide some kind of page in your thread after reading from - the uffd. You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``. - The normal behavior of the OS automatically providing a zero page on - an anonymous mmaping is not in place. +- You can tell which kind of fault occurred by examining + ``pagefault.flags`` within the ``uffd_msg``, checking for the + ``UFFD_PAGEFAULT_FLAG_*`` flags. - None of the page-delivering ioctls default to the range that you registered with. You must fill in all fields for the appropriate @@ -122,9 +145,9 @@ Notes: - You get the address of the access that triggered the missing page event out of a struct uffd_msg that you read in the thread from the - uffd. You can supply as many pages as you want with ``UFFDIO_COPY`` or - ``UFFDIO_ZEROPAGE``. Keep in mind that unless you used DONTWAKE then - the first of any of those IOCTLs wakes up the faulting thread. + uffd. You can supply as many pages as you want with these IOCTLs. + Keep in mind that unless you used DONTWAKE then the first of any of + those IOCTLs wakes up the faulting thread. - Be sure to test for all errors including (``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges