From patchwork Wed Aug 21 04:03:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11105381 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 65E8114DE for ; Wed, 21 Aug 2019 04:04:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2502F22CF7 for ; Wed, 21 Aug 2019 04:04:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="WIFEFpxC" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2502F22CF7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 74C9F6B028A; Wed, 21 Aug 2019 00:04:02 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6B17E6B0287; Wed, 21 Aug 2019 00:04:02 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5268E6B028A; Wed, 21 Aug 2019 00:04:02 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0197.hostedemail.com [216.40.44.197]) by kanga.kvack.org (Postfix) with ESMTP id 28D406B0287 for ; Wed, 21 Aug 2019 00:04:02 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id A859E68BF for ; Wed, 21 Aug 2019 04:04:01 +0000 (UTC) X-FDA: 75845091882.29.door42_2313b6caadc27 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,jhubbard@nvidia.com,:akpm@linux-foundation.org:hch@infradead.org:dan.j.williams@intel.com:david@fromorbit.com:ira.weiny@intel.com:jack@suse.cz:jgg@ziepe.ca:jglisse@redhat.com:vbabka@suse.cz:linux-kernel@vger.kernel.org::linux-fsdevel@vger.kernel.org:linux-rdma@vger.kernel.org:jhubbard@nvidia.com:mhocko@kernel.org,RULES_HIT:30003:30012:30054:30064:30070:30090,0,RBL:216.228.121.143:@nvidia.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: door42_2313b6caadc27 X-Filterd-Recvd-Size: 9059 Received: from hqemgate14.nvidia.com (hqemgate14.nvidia.com [216.228.121.143]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Wed, 21 Aug 2019 04:04:00 +0000 (UTC) Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 20 Aug 2019 21:03:57 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Tue, 20 Aug 2019 21:03:57 -0700 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Tue, 20 Aug 2019 21:03:57 -0700 Received: from HQMAIL107.nvidia.com (172.20.187.13) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 21 Aug 2019 04:03:57 +0000 Received: from hqnvemgw02.nvidia.com (172.16.227.111) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 21 Aug 2019 04:03:57 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw02.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Tue, 20 Aug 2019 21:03:56 -0700 From: John Hubbard To: Andrew Morton CC: Christoph Hellwig , Dan Williams , Dave Chinner , Ira Weiny , Jan Kara , Jason Gunthorpe , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Vlastimil Babka , LKML , , , , John Hubbard , Michal Hocko Subject: [PATCH 3/4] mm/gup: introduce FOLL_PIN flag for get_user_pages() Date: Tue, 20 Aug 2019 21:03:54 -0700 Message-ID: <20190821040355.19566-3-jhubbard@nvidia.com> X-Mailer: git-send-email 2.22.1 In-Reply-To: <20190821040355.19566-1-jhubbard@nvidia.com> References: <20190821040355.19566-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1566360237; bh=5r1fejuMLYlPmFKrbU5YkdWKiDn6t75myInCYFT6WEU=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=WIFEFpxCtE6OwHNCIdo2NZnxXjKCOObWWTR/5bPnu4OTiQxbaaNtQ88ilav5mkb8e oi0VcWLet2xwR4MkTdU+8ReBwF00cDt0fCuZrzyvuGb6CXP9affFbctpBJHCsHQ10B 0skkD0l1WJzTts31vkmZuRaWUjd9Wkk6THQplpRacXxWgzCLaX36mIuNd4zv7WzRt9 9eap0lu+lgOI9bIQ6qW3m6ykB9I0F5l9seBAIHdtgtiy1ui7q+ll2eqNR7uBN6XzKu WYTEWcvEJWpqN7ZAYHDwCQqJBsv47Kviys5ZrStsJDUoisgl1GbRChKt1KztKfNVMa wKy2nX60T0cnw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: FOLL_PIN is set by callers of vaddr_pin_pages(). This is different than FOLL_LONGTERM, because even short term page pins need a new kind of tracking, if those pinned pages' data is going to potentially be modified. This situation is described in more detail in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). FOLL_PIN is added now, rather than waiting until there is code that takes action based on FOLL_PIN. That's because having FOLL_PIN in the code helps to highlight the differences between: a) get_user_pages(): soon to be deprecated. Used to pin pages, but without awareness of file systems that might use those pages, b) The original vaddr_pin_pages(): intended only for FOLL_LONGTERM and DAX use cases. This assumes direct IO and therefore is not applicable the most of the other callers of get_user_pages(), and Also add fairly extensive documentation of the meaning and use of both FOLL_PIN and FOLL_LONGTERM. Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases in this documentation. (I've reworded it and expanded on it slightly.) Cc: Vlastimil Babka Cc: Jan Kara Cc: Michal Hocko Cc: Ira Weiny Signed-off-by: John Hubbard --- include/linux/mm.h | 56 +++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 6 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index bc675e94ddf8..6e7de424bf5e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2644,6 +2644,8 @@ static inline vm_fault_t vmf_error(int err) struct page *follow_page(struct vm_area_struct *vma, unsigned long address, unsigned int foll_flags); +/* Flags for follow_page(), get_user_pages ("GUP"), and vaddr_pin_pages(): */ + #define FOLL_WRITE 0x01 /* check pte is writable */ #define FOLL_TOUCH 0x02 /* mark page accessed */ #define FOLL_GET 0x04 /* do get_page on page */ @@ -2663,13 +2665,15 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_ANON 0x8000 /* don't do file mappings */ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ +#define FOLL_PIN 0x40000 /* pages must be released via put_user_page() */ /* - * NOTE on FOLL_LONGTERM: + * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each + * other. Here is what they mean, and how to use them: * * FOLL_LONGTERM indicates that the page will be held for an indefinite time - * period _often_ under userspace control. This is contrasted with - * iov_iter_get_pages() where usages which are transient. + * period _often_ under userspace control. This is in contrast to + * iov_iter_get_pages(), where usages which are transient. * * FIXME: For pages which are part of a filesystem, mappings are subject to the * lifetime enforced by the filesystem and we need guarantees that longterm @@ -2684,11 +2688,51 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, * Currently only get_user_pages() and get_user_pages_fast() support this flag * and calls to get_user_pages_[un]locked are specifically not allowed. This * is due to an incompatibility with the FS DAX check and - * FAULT_FLAG_ALLOW_RETRY + * FAULT_FLAG_ALLOW_RETRY. * - * In the CMA case: longterm pins in a CMA region would unnecessarily fragment - * that region. And so CMA attempts to migrate the page before pinning when + * In the CMA case: long term pins in a CMA region would unnecessarily fragment + * that region. And so, CMA attempts to migrate the page before pinning, when * FOLL_LONGTERM is specified. + * + * FOLL_PIN indicates that a special kind of tracking (not just page->_refcount, + * but an additional pin counting system) will be invoked. This is intended for + * anything that gets a page reference and then touches page data (for example, + * Direct IO). This lets the filesystem know that some non-file-system entity is + * potentially changing the pages' data. FOLL_PIN pages must be released, + * ultimately, by a call to put_user_page(). Typically that will be via one of + * the vaddr_unpin_pages() variants. + * + * FIXME: note that this special tracking is not in place yet. However, the + * pages should still be released by put_user_page(). + * + * When and where to use each flag: + * + * CASE 1: Direct IO (DIO). There are GUP references to pages that are serving + * as DIO buffers. These buffers are needed for a relatively short time (so they + * are not "long term"). No special synchronization with page_mkclean() or + * munmap() is provided. Therefore, flags to set at the call site are: + * + * FOLL_PIN + * + * CASE 2: RDMA. There are GUP references to pages that are serving as DMA + * buffers. These buffers are needed for a long time ("long term"). No special + * synchronization with page_mkclean() or munmap() is provided. Therefore, flags + * to set at the call site are: + * + * FOLL_PIN | FOLL_LONGTERM + * + * There is also a special case when the pages are DAX pages: in addition to the + * above flags, the caller needs a file lease. This is provided via the struct + * vaddr_pin argument to vaddr_pin_pages(). + * + * CASE 3: ODP (Mellanox/Infiniband On Demand Paging: the hardware supports + * replayable page faulting). There are GUP references to pages serving as DMA + * buffers. For ODP, MMU notifiers are used to synchronize with page_mkclean() + * and munmap(). Therefore, normal GUP calls are sufficient, so neither flag + * needs to be set. + * + * CASE 4: pinning for struct page manipulation only. Here, normal GUP calls are + * sufficient, so neither flag needs to be set. */ static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)