From patchwork Wed Oct 30 22:49:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220425 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5B10A1599 for ; Wed, 30 Oct 2019 22:52:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2F3102087F for ; Wed, 30 Oct 2019 22:52:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="LHZ6lU3b" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727624AbfJ3Wto (ORCPT ); Wed, 30 Oct 2019 18:49:44 -0400 Received: from hqemgate16.nvidia.com ([216.228.121.65]:10796 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727165AbfJ3Wtn (ORCPT ); Wed, 30 Oct 2019 18:49:43 -0400 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:41 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:35 -0700 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 30 Oct 2019 15:49:35 -0700 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:35 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:34 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:34 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard , "Kirill A . Shutemov" Subject: [PATCH 01/19] mm/gup: pass flags arg to __gup_device_* functions Date: Wed, 30 Oct 2019 15:49:12 -0700 Message-ID: <20191030224930.3990755-2-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475781; bh=d7EUGpQRTMvoYc+6+aLdOHouLAD5TlAUer68L/VNpoI=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=LHZ6lU3b+Rmuxt6ntgTWgWPO1DjlyJfgSFkeATOM45bmCiGsVgXIRtuRcHVay2m7y QkorONll+8JNtmmEP2Ct98iMV9oni8t13QTqaMFchuXeLDFzTcNhK19kYx95+C5Au9 ze2QMsU3TEAUahyq5IgliLnkeuV5Me50VDXXYGrkbu6XyUmKt/wogCCFoTKEEA1z7c c3/5zdaDb9u8RFGw3NMmUBcQNRtDGOl+4IErI5FymIoWyPpeQwvVnUcl9RiCCbNOQE /CaCRNf9WNtGpnCVwW6i7NQfkRAktZTVjBFJUZO66X8K050+YYMC6fvZ5jriiMtRSE +LYcxSYecvZcg== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org A subsequent patch requires access to gup flags, so pass the flags argument through to the __gup_device_* functions. Also placate checkpatch.pl by shortening a nearby line. Cc: Kirill A. Shutemov Signed-off-by: John Hubbard Reviewed-by: Ira Weiny --- mm/gup.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 8f236a335ae9..85caf76b3012 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1890,7 +1890,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE) static int __gup_device_huge(unsigned long pfn, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { int nr_start = *nr; struct dev_pagemap *pgmap = NULL; @@ -1916,13 +1917,14 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, } static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { unsigned long fault_pfn; int nr_start = *nr; fault_pfn = pmd_pfn(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - if (!__gup_device_huge(fault_pfn, addr, end, pages, nr)) + if (!__gup_device_huge(fault_pfn, addr, end, flags, pages, nr)) return 0; if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { @@ -1933,13 +1935,14 @@ static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, } static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { unsigned long fault_pfn; int nr_start = *nr; fault_pfn = pud_pfn(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); - if (!__gup_device_huge(fault_pfn, addr, end, pages, nr)) + if (!__gup_device_huge(fault_pfn, addr, end, flags, pages, nr)) return 0; if (unlikely(pud_val(orig) != pud_val(*pudp))) { @@ -1950,14 +1953,16 @@ static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, } #else static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { BUILD_BUG(); return 0; } static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { BUILD_BUG(); return 0; @@ -2062,7 +2067,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, if (pmd_devmap(orig)) { if (unlikely(flags & FOLL_LONGTERM)) return 0; - return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr); + return __gup_device_huge_pmd(orig, pmdp, addr, end, flags, + pages, nr); } refs = 0; @@ -2092,7 +2098,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, } static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, - unsigned long end, unsigned int flags, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { struct page *head, *page; int refs; @@ -2103,7 +2110,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, if (pud_devmap(orig)) { if (unlikely(flags & FOLL_LONGTERM)) return 0; - return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr); + return __gup_device_huge_pud(orig, pudp, addr, end, flags, + pages, nr); } refs = 0; From patchwork Wed Oct 30 22:49:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220353 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 85A4C13B1 for ; Wed, 30 Oct 2019 22:51:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 57E3E20874 for ; Wed, 30 Oct 2019 22:51:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="q6Q/A8TB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728135AbfJ3Wv5 (ORCPT ); Wed, 30 Oct 2019 18:51:57 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:1382 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727671AbfJ3Wts (ORCPT ); Wed, 30 Oct 2019 18:49:48 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:43 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:37 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:37 -0700 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:37 +0000 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:37 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:35 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:35 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , "Mauro Carvalho Chehab" , Michael Ellerman , Michal Hocko , Mike Kravetz , "Paul Mackerras" , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard , "Christoph Hellwig" , "Aneesh Kumar K . V" Subject: [PATCH 02/19] mm/gup: factor out duplicate code from four routines Date: Wed, 30 Oct 2019 15:49:13 -0700 Message-ID: <20191030224930.3990755-3-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475783; bh=tBh63Up4GmYWMpZfv2SoHkQER3l3/AwmpTx4CmGyUOY=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=q6Q/A8TBBZ55PyF/FZZqYnJ+lgBDwYNDtC48DaucDzCF0TjR8x89MUFmfJrXfVIy8 YhLIOpQBeIReIooXA8TPZibffV6cpNr4lEyyTxNLOCl4fSlAiNzImWEmpAHpBQAgnm 2VJfy72jnhXV4Na9artoMHPQqpC3lkGe4ymiRJWOT1WeuiuvX5ZsQGDmnAJG4LGImR 9ty1Onsdh/I+zLky4Hr6hnCHYagTmLcM/PUFXTvUJ2RHGzNqzx9q0kjJ96JHXHJJWk 2oGM89HTfGbtaCai1JlzNt3fE78ewH7qAVLMEPdwaEG9DhYOb9MPVg/HVxjVhX4dEl BY9wQk9uQlmQw== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org There are four locations in gup.c that have a fair amount of code duplication. This means that changing one requires making the same changes in four places, not to mention reading the same code four times, and wondering if there are subtle differences. Factor out the common code into static functions, thus reducing the overall line count and the code's complexity. Also, take the opportunity to slightly improve the efficiency of the error cases, by doing a mass subtraction of the refcount, surrounded by get_page()/put_page(). Also, further simplify (slightly), by waiting until the the successful end of each routine, to increment *nr. Cc: Christoph Hellwig Cc: Aneesh Kumar K.V Signed-off-by: John Hubbard --- mm/gup.c | 113 ++++++++++++++++++++++--------------------------------- 1 file changed, 46 insertions(+), 67 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 85caf76b3012..8fb0d9cdfaf5 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1969,6 +1969,35 @@ static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr, } #endif +static int __record_subpages(struct page *page, unsigned long addr, + unsigned long end, struct page **pages, int nr) +{ + int nr_recorded_pages = 0; + + do { + pages[nr] = page; + nr++; + page++; + nr_recorded_pages++; + } while (addr += PAGE_SIZE, addr != end); + return nr_recorded_pages; +} + +static void __remove_refs_from_head(struct page *page, int refs) +{ + /* Do a get_page() first, in case refs == page->_refcount */ + get_page(page); + page_ref_sub(page, refs); + put_page(page); +} + +static int __huge_pt_done(struct page *head, int nr_recorded_pages, int *nr) +{ + *nr += nr_recorded_pages; + SetPageReferenced(head); + return 1; +} + #ifdef CONFIG_ARCH_HAS_HUGEPD static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end, unsigned long sz) @@ -1998,34 +2027,19 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, /* hugepages are never "special" */ VM_BUG_ON(!pfn_valid(pte_pfn(pte))); - refs = 0; head = pte_page(pte); - page = head + ((addr & (sz-1)) >> PAGE_SHIFT); - do { - VM_BUG_ON(compound_head(page) != head); - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); + refs = __record_subpages(page, addr, end, pages, *nr); head = try_get_compound_head(head, refs); - if (!head) { - *nr -= refs; + if (!head) return 0; - } if (unlikely(pte_val(pte) != pte_val(*ptep))) { - /* Could be optimized better */ - *nr -= refs; - while (refs--) - put_page(head); + __remove_refs_from_head(head, refs); return 0; } - - SetPageReferenced(head); - return 1; + return __huge_pt_done(head, refs, nr); } static int gup_huge_pd(hugepd_t hugepd, unsigned long addr, @@ -2071,30 +2085,18 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, pages, nr); } - refs = 0; page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - do { - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); + refs = __record_subpages(page, addr, end, pages, *nr); head = try_get_compound_head(pmd_page(orig), refs); - if (!head) { - *nr -= refs; + if (!head) return 0; - } if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { - *nr -= refs; - while (refs--) - put_page(head); + __remove_refs_from_head(head, refs); return 0; } - - SetPageReferenced(head); - return 1; + return __huge_pt_done(head, refs, nr); } static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, @@ -2114,30 +2116,18 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, pages, nr); } - refs = 0; page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); - do { - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); + refs = __record_subpages(page, addr, end, pages, *nr); head = try_get_compound_head(pud_page(orig), refs); - if (!head) { - *nr -= refs; + if (!head) return 0; - } if (unlikely(pud_val(orig) != pud_val(*pudp))) { - *nr -= refs; - while (refs--) - put_page(head); + __remove_refs_from_head(head, refs); return 0; } - - SetPageReferenced(head); - return 1; + return __huge_pt_done(head, refs, nr); } static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, @@ -2151,30 +2141,19 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, return 0; BUILD_BUG_ON(pgd_devmap(orig)); - refs = 0; + page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT); - do { - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); + refs = __record_subpages(page, addr, end, pages, *nr); head = try_get_compound_head(pgd_page(orig), refs); - if (!head) { - *nr -= refs; + if (!head) return 0; - } if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) { - *nr -= refs; - while (refs--) - put_page(head); + __remove_refs_from_head(head, refs); return 0; } - - SetPageReferenced(head); - return 1; + return __huge_pt_done(head, refs, nr); } static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, From patchwork Wed Oct 30 22:49:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220387 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E0D81747 for ; Wed, 30 Oct 2019 22:52:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0CC632067D for ; Wed, 30 Oct 2019 22:52:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="inYr1VKQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728223AbfJ3WwR (ORCPT ); Wed, 30 Oct 2019 18:52:17 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:1383 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727674AbfJ3Wtr (ORCPT ); Wed, 30 Oct 2019 18:49:47 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:44 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:38 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:38 -0700 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:38 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:37 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:36 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 03/19] goldish_pipe: rename local pin_user_pages() routine Date: Wed, 30 Oct 2019 15:49:14 -0700 Message-ID: <20191030224930.3990755-4-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475784; bh=BivA9QyOArPxchgp7rWZ/PJ/DTYMWoFVd+MwMn0rTgI=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=inYr1VKQQ+/f3AJ9PMbAavdzFyUp0bVR+a/Q7wotmLdt2JZrBhC2rs5jHrRwyYF0S aO5uYODOCX5dXT5/fs+gKo5RvdiQ5+sxSh7D3Hxo5OsdxO53oCSRqHcjMPA4dK6DaX j7xdlvX1z0cFoABxsiSAW7DLwV/tuHT/1Q0YfT9exr2qeN6LKFqMXaqyt7lL1U6jNy Sqv6+lsishbu4eLM9zaSsrnp8KygW5u0Vhshv1VfwG/76M1A1TEQPaWvkZw+gYHt+J UO7XZqZd7TeSpIgO4Cl76BDRmr6O9ZoMM4Wow6h3Vu5x3pHoolK6qXKxYtZ1fZdDm/ xHJvqckDB273g== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org 1. Avoid naming conflicts: rename local static function from "pin_user_pages()" to "pin_goldfish_pages()". An upcoming patch will introduce a global pin_user_pages() function. Signed-off-by: John Hubbard Reviewed-by: Ira Weiny --- drivers/platform/goldfish/goldfish_pipe.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/platform/goldfish/goldfish_pipe.c b/drivers/platform/goldfish/goldfish_pipe.c index cef0133aa47a..7ed2a21a0bac 100644 --- a/drivers/platform/goldfish/goldfish_pipe.c +++ b/drivers/platform/goldfish/goldfish_pipe.c @@ -257,12 +257,12 @@ static int goldfish_pipe_error_convert(int status) } } -static int pin_user_pages(unsigned long first_page, - unsigned long last_page, - unsigned int last_page_size, - int is_write, - struct page *pages[MAX_BUFFERS_PER_COMMAND], - unsigned int *iter_last_page_size) +static int pin_goldfish_pages(unsigned long first_page, + unsigned long last_page, + unsigned int last_page_size, + int is_write, + struct page *pages[MAX_BUFFERS_PER_COMMAND], + unsigned int *iter_last_page_size) { int ret; int requested_pages = ((last_page - first_page) >> PAGE_SHIFT) + 1; @@ -354,9 +354,9 @@ static int transfer_max_buffers(struct goldfish_pipe *pipe, if (mutex_lock_interruptible(&pipe->lock)) return -ERESTARTSYS; - pages_count = pin_user_pages(first_page, last_page, - last_page_size, is_write, - pipe->pages, &iter_last_page_size); + pages_count = pin_goldfish_pages(first_page, last_page, + last_page_size, is_write, + pipe->pages, &iter_last_page_size); if (pages_count < 0) { mutex_unlock(&pipe->lock); return pages_count; From patchwork Wed Oct 30 22:49:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220451 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C328A13B1 for ; Wed, 30 Oct 2019 22:53:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A115B2087F for ; Wed, 30 Oct 2019 22:53:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="TDqU50c8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727465AbfJ3Wtm (ORCPT ); Wed, 30 Oct 2019 18:49:42 -0400 Received: from hqemgate16.nvidia.com ([216.228.121.65]:10798 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727250AbfJ3Wtm (ORCPT ); Wed, 30 Oct 2019 18:49:42 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:45 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:39 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:39 -0700 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:39 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:38 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:38 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 04/19] media/v4l2-core: set pages dirty upon releasing DMA buffers Date: Wed, 30 Oct 2019 15:49:15 -0700 Message-ID: <20191030224930.3990755-5-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475785; bh=XaBaipSsjoikJvkLH7RaOydN/WuSGfkJsZAiwvM/xPE=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=TDqU50c8TwyKmMgf/QDOTe3Vi1bVOitaLNonh0qiMoxrjojMp0wTfRux4Swk2FHl5 JJGYBcwT3ZevDeLVWf4S6QuZMQ0Lgc76irFx0ekQH8Fnv8kyfOshtxUwj3keS3tCY7 F5D1m4A2EQr/qED8mKHWtZbO0i+cEcxHpr70Q6RxWH8mk6sYhzR72caMlrH7U/2DJ9 cwN4vjQSS67t9JZPJ0DxjZGJhW0NmpX9Xjg5px4Z3HhCMFeEogkJ4W63rXbQ49reo0 Ta0Ru6XcEQojtc0AiLrAqhP/0gk5Gf5yaXkfGpmKAPqaQeMlSaBzLhwCOZG/jsUpVJ ChKFrWebViYAg== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org After DMA is complete, and the device and CPU caches are synchronized, it's still required to mark the CPU pages as dirty, if the data was coming from the device. However, this driver was just issuing a bare put_page() call, without any set_page_dirty*() call. Fix the problem, by calling set_page_dirty_lock() if the CPU pages were potentially receiving data from the device. Cc: Mauro Carvalho Chehab Signed-off-by: John Hubbard --- drivers/media/v4l2-core/videobuf-dma-sg.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c index 66a6c6c236a7..28262190c3ab 100644 --- a/drivers/media/v4l2-core/videobuf-dma-sg.c +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c @@ -349,8 +349,11 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma) BUG_ON(dma->sglen); if (dma->pages) { - for (i = 0; i < dma->nr_pages; i++) + for (i = 0; i < dma->nr_pages; i++) { + if (dma->direction == DMA_FROM_DEVICE) + set_page_dirty_lock(dma->pages[i]); put_page(dma->pages[i]); + } kfree(dma->pages); dma->pages = NULL; } From patchwork Wed Oct 30 22:49:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220413 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 66D671599 for ; Wed, 30 Oct 2019 22:52:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 39F6D217F9 for ; Wed, 30 Oct 2019 22:52:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="nDqCtGuN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727666AbfJ3Wtq (ORCPT ); Wed, 30 Oct 2019 18:49:46 -0400 Received: from hqemgate15.nvidia.com ([216.228.121.64]:5252 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727614AbfJ3Wtp (ORCPT ); Wed, 30 Oct 2019 18:49:45 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:48 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:41 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:41 -0700 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:40 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:39 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:39 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 05/19] mm/gup: introduce pin_user_pages*() and FOLL_PIN Date: Wed, 30 Oct 2019 15:49:16 -0700 Message-ID: <20191030224930.3990755-6-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475788; bh=q+XLxTTs3VV9NQYlAIix1/xq2y74kPKr4+k38NlDVWo=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=nDqCtGuNSvvDejx1vtY8xGgFUondHPHr+I+PvmvPI6LppRpXTXHrtb7uWRivWS9t6 4gF30LMcvxWujS+tccoegXL9+UAT8EcFEFJUzGwdMv7TO7uBB0pLyNV7K9kp4jNRTU KDw+KtQ6QpNILKiYWF7MADvsWEEcuISGTEcz1HM9CeQiMQSGUiott2qrunFLPKNFrY Ez3OuvWjeNqJUnEYJmm2FgcNA4VgmPRn9Om7+tLBDnD0yeaO+tuo/gfBWGhIMT+G9z nnSdDH5Y49JsX2G0spTf1fTfOpf2z/AbZxhgx3kJL9MUrCsqp58qgzhjBjY4/EhBAc W7g6XMs0i1myQ== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Introduce pin_user_pages*() variations of get_user_pages*() calls, and also pin_longterm_pages*() variations. These variants all set FOLL_PIN, which is also introduced, and basically documented. (An upcoming patch provides more extensive documentation.) The second set (pin_longterm*) also sets FOLL_LONGTERM: pin_user_pages() pin_user_pages_remote() pin_user_pages_fast() pin_longterm_pages() pin_longterm_pages_remote() pin_longterm_pages_fast() All pages that are pinned via the above calls, must be unpinned via put_user_page(). The underlying rules are: * These are gup-internal flags, so the call sites should not directly set FOLL_PIN nor FOLL_LONGTERM. That behavior is enforced with assertions, for the new FOLL_PIN flag. However, for the pre-existing FOLL_LONGTERM flag, which has some call sites that still directly set FOLL_LONGTERM, there is no assertion yet. * Call sites that want to indicate that they are going to do DirectIO ("DIO") or something with similar characteristics, should call a get_user_pages()-like wrapper call that sets FOLL_PIN. These wrappers will: * Start with "pin_user_pages" instead of "get_user_pages". That makes it easy to find and audit the call sites. * Set FOLL_PIN * For pages that are received via FOLL_PIN, those pages must be returned via put_user_page(). Signed-off-by: John Hubbard --- include/linux/mm.h | 53 ++++++++- mm/gup.c | 284 +++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 311 insertions(+), 26 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cc292273e6ba..62c838a3e6c7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1526,9 +1526,23 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas, int *locked); +long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas, int *locked); +long pin_longterm_pages_remote(struct task_struct *tsk, struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas, int *locked); long get_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas); +long pin_user_pages(unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas); +long pin_longterm_pages(unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas); long get_user_pages_locked(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, int *locked); long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, @@ -1536,6 +1550,10 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, int get_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); +int pin_user_pages_fast(unsigned long start, int nr_pages, + unsigned int gup_flags, struct page **pages); +int pin_longterm_pages_fast(unsigned long start, int nr_pages, + unsigned int gup_flags, struct page **pages); int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, @@ -2594,13 +2612,15 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_ANON 0x8000 /* don't do file mappings */ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ +#define FOLL_PIN 0x40000 /* pages must be released via put_user_page() */ /* - * NOTE on FOLL_LONGTERM: + * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each + * other. Here is what they mean, and how to use them: * * FOLL_LONGTERM indicates that the page will be held for an indefinite time - * period _often_ under userspace control. This is contrasted with - * iov_iter_get_pages() where usages which are transient. + * period _often_ under userspace control. This is in contrast to + * iov_iter_get_pages(), where usages which are transient. * * FIXME: For pages which are part of a filesystem, mappings are subject to the * lifetime enforced by the filesystem and we need guarantees that longterm @@ -2615,11 +2635,32 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, * Currently only get_user_pages() and get_user_pages_fast() support this flag * and calls to get_user_pages_[un]locked are specifically not allowed. This * is due to an incompatibility with the FS DAX check and - * FAULT_FLAG_ALLOW_RETRY + * FAULT_FLAG_ALLOW_RETRY. * - * In the CMA case: longterm pins in a CMA region would unnecessarily fragment - * that region. And so CMA attempts to migrate the page before pinning when + * In the CMA case: long term pins in a CMA region would unnecessarily fragment + * that region. And so, CMA attempts to migrate the page before pinning, when * FOLL_LONGTERM is specified. + * + * FOLL_PIN indicates that a special kind of tracking (not just page->_refcount, + * but an additional pin counting system) will be invoked. This is intended for + * anything that gets a page reference and then touches page data (for example, + * Direct IO). This lets the filesystem know that some non-file-system entity is + * potentially changing the pages' data. In contrast to FOLL_GET (whose pages + * are released via put_page()), FOLL_PIN pages must be released, ultimately, by + * a call to put_user_page(). + * + * FOLL_PIN is similar to FOLL_GET: both of these pin pages. They use different + * and separate refcounting mechanisms, however, and that means that each has + * its own acquire and release mechanisms: + * + * FOLL_GET: get_user_pages*() to acquire, and put_page() to release. + * + * FOLL_PIN: pin_user_pages*() or pin_longterm_pages*() to acquire, and + * put_user_pages to release. + * + * FOLL_PIN and FOLL_GET are mutually exclusive. + * + * Please see Documentation/vm/pin_user_pages.rst for more information. */ static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags) diff --git a/mm/gup.c b/mm/gup.c index 8fb0d9cdfaf5..8694bc7b3df3 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -179,6 +179,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, spinlock_t *ptl; pte_t *ptep, pte; + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == + (FOLL_PIN | FOLL_GET))) + return ERR_PTR(-EINVAL); retry: if (unlikely(pmd_bad(*pmd))) return no_page_table(vma, flags); @@ -790,7 +794,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, start = untagged_addr(start); - VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET)); + VM_BUG_ON(!!pages != !!(gup_flags & (FOLL_GET | FOLL_PIN))); /* * If FOLL_FORCE is set then do not force a full fault as the hinting @@ -1014,7 +1018,16 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, BUG_ON(*locked != 1); } - if (pages) + /* + * FOLL_PIN and FOLL_GET are mutually exclusive. Traditional behavior + * is to set FOLL_GET if the caller wants pages[] filled in (but has + * carelessly failed to specify FOLL_GET), so keep doing that, but only + * for FOLL_GET, not for the newer FOLL_PIN. + * + * FOLL_PIN always expects pages to be non-null, but no need to assert + * that here, as any failures will be obvious enough. + */ + if (pages && !(flags & FOLL_PIN)) flags |= FOLL_GET; pages_done = 0; @@ -1133,6 +1146,12 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, * is written to, set_page_dirty (or set_page_dirty_lock, as appropriate) must * be called after the page is finished with, and before put_page is called. * + * A note on gup_flags: FOLL_PIN must only be set internally by the + * pin_user_page*() and pin_longterm_*() APIs, never directly by the caller. + * That's in order to help avoid mismatches when releasing pages: + * get_user_pages*() pages must be released via put_page(), while + * pin_user_pages*() pages must be released via put_user_page(). + * * get_user_pages is typically used for fewer-copy IO operations, to get a * handle on the memory by some means other than accesses via the user virtual * addresses. The pages may be submitted for DMA to devices or accessed via @@ -1151,6 +1170,14 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas, int *locked) { + /* + * As detailed above, FOLL_PIN must only be set internally by the + * pin_user_page*() and pin_longterm_*() APIs, never directly by the + * caller, so enforce that with an assertion: + */ + if (WARN_ON_ONCE(gup_flags & FOLL_PIN)) + return -EINVAL; + /* * FIXME: Current FOLL_LONGTERM behavior is incompatible with * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on @@ -1603,11 +1630,25 @@ static __always_inline long __gup_longterm_locked(struct task_struct *tsk, * and mm being operated on are the current task's and don't allow * passing of a locked parameter. We also obviously don't pass * FOLL_REMOTE in here. + * + * A note on gup_flags: FOLL_PIN should only be set internally by the + * pin_user_page*() and pin_longterm_*() APIs, never directly by the caller. + * That's in order to help avoid mismatches when releasing pages: + * get_user_pages*() pages must be released via put_page(), while + * pin_user_pages*() pages must be released via put_user_page(). */ long get_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas) { + /* + * As detailed above, FOLL_PIN must only be set internally by the + * pin_user_page*() and pin_longterm_*() APIs, never directly by the + * caller, so enforce that with an assertion: + */ + if (WARN_ON_ONCE(gup_flags & FOLL_PIN)) + return -EINVAL; + return __gup_longterm_locked(current, current->mm, start, nr_pages, pages, vmas, gup_flags | FOLL_TOUCH); } @@ -2366,24 +2407,9 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages, return ret; } -/** - * get_user_pages_fast() - pin user pages in memory - * @start: starting user address - * @nr_pages: number of pages from start to pin - * @gup_flags: flags modifying pin behaviour - * @pages: array that receives pointers to the pages pinned. - * Should be at least nr_pages long. - * - * Attempt to pin user pages in memory without taking mm->mmap_sem. - * If not successful, it will fall back to taking the lock and - * calling get_user_pages(). - * - * Returns number of pages pinned. This may be fewer than the number - * requested. If nr_pages is 0 or negative, returns 0. If no pages - * were pinned, returns -errno. - */ -int get_user_pages_fast(unsigned long start, int nr_pages, - unsigned int gup_flags, struct page **pages) +static int internal_get_user_pages_fast(unsigned long start, int nr_pages, + unsigned int gup_flags, + struct page **pages) { unsigned long addr, len, end; int nr = 0, ret = 0; @@ -2428,4 +2454,222 @@ int get_user_pages_fast(unsigned long start, int nr_pages, return ret; } + +/** + * get_user_pages_fast() - pin user pages in memory + * @start: starting user address + * @nr_pages: number of pages from start to pin + * @gup_flags: flags modifying pin behaviour + * @pages: array that receives pointers to the pages pinned. + * Should be at least nr_pages long. + * + * Attempt to pin user pages in memory without taking mm->mmap_sem. + * If not successful, it will fall back to taking the lock and + * calling get_user_pages(). + * + * A note on gup_flags: FOLL_PIN must only be set internally by the + * pin_user_page*() and pin_longterm_*() APIs, never directly by the caller. + * That's in order to help avoid mismatches when releasing pages: + * get_user_pages*() pages must be released via put_page(), while + * pin_user_pages*() pages must be released via put_user_page(). + * + * Returns number of pages pinned. This may be fewer than the number requested. + * If nr_pages is 0 or negative, returns 0. If no pages were pinned, returns + * -errno. + */ +int get_user_pages_fast(unsigned long start, int nr_pages, + unsigned int gup_flags, struct page **pages) +{ + /* + * As detailed above, FOLL_PIN must only be set internally by the + * pin_user_page*() and pin_longterm_*() APIs, never directly by the + * caller, so enforce that: + */ + if (WARN_ON_ONCE(gup_flags & FOLL_PIN)) + return -EINVAL; + + return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages); +} EXPORT_SYMBOL_GPL(get_user_pages_fast); + +/** + * pin_user_pages_fast() - pin user pages in memory without taking locks + * + * Nearly the same as get_user_pages_fast(), except that FOLL_PIN is set. See + * get_user_pages_fast() for documentation on the function arguments, because + * the arguments here are identical. + * + * FOLL_PIN means that the pages must be released via put_user_page(). Please + * see Documentation/vm/pin_user_pages.rst for further details. + * + * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It + * is NOT intended for Case 2 (RDMA: long-term pins). + */ +int pin_user_pages_fast(unsigned long start, int nr_pages, + unsigned int gup_flags, struct page **pages) +{ + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE(gup_flags & FOLL_GET)) + return -EINVAL; + + gup_flags |= FOLL_PIN; + return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages); +} +EXPORT_SYMBOL_GPL(pin_user_pages_fast); + +/** + * pin_longterm_pages_fast() - pin user pages in memory without taking locks + * + * Nearly the same as get_user_pages_fast(), except that FOLL_PIN and + * FOLL_LONGTERM are set. See get_user_pages_fast() for documentation on the + * function arguments, because the arguments here are identical. + * + * FOLL_PIN means that the pages must be released via put_user_page(). Please + * see Documentation/vm/pin_user_pages.rst for further details. + * + * FOLL_LONGTERM means that the pages are being pinned for "long term" use, + * typically by a non-CPU device, and we cannot be sure that waiting for a + * pinned page to become unpin will be effective. + * + * This is intended for Case 2 (RDMA: long-term pins) of the FOLL_PIN + * documentation. + */ +int pin_longterm_pages_fast(unsigned long start, int nr_pages, + unsigned int gup_flags, struct page **pages) +{ + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE(gup_flags & FOLL_GET)) + return -EINVAL; + + gup_flags |= (FOLL_PIN | FOLL_LONGTERM); + return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages); +} +EXPORT_SYMBOL_GPL(pin_longterm_pages_fast); + +/** + * pin_user_pages_remote() - pin pages for (typically) use by Direct IO, and + * return the pages to the user. + * + * Nearly the same as get_user_pages_remote(), except that FOLL_PIN is set. See + * get_user_pages_remote() for documentation on the function arguments, because + * the arguments here are identical. + * + * FOLL_PIN means that the pages must be released via put_user_page(). Please + * see Documentation/vm/pin_user_pages.rst for details. + * + * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It + * is NOT intended for Case 2 (RDMA: long-term pins). + */ +long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas, int *locked) +{ + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE(gup_flags & FOLL_GET)) + return -EINVAL; + + gup_flags |= FOLL_TOUCH | FOLL_REMOTE | FOLL_PIN; + + return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, + locked, gup_flags); +} +EXPORT_SYMBOL(pin_user_pages_remote); + +/** + * pin_longterm_pages_remote() - pin pages for (typically) use by Direct IO, and + * return the pages to the user. + * + * Nearly the same as get_user_pages_remote(), but note that FOLL_TOUCH is not + * set, and FOLL_PIN and FOLL_LONGTERM are set. See get_user_pages_remote() for + * documentation on the function arguments, because the arguments here are + * identical. + * + * FOLL_PIN means that the pages must be released via put_user_page(). Please + * see Documentation/vm/pin_user_pages.rst for further details. + * + * FOLL_LONGTERM means that the pages are being pinned for "long term" use, + * typically by a non-CPU device, and we cannot be sure that waiting for a + * pinned page to become unpin will be effective. + * + * This is intended for Case 2 (RDMA: long-term pins) in + * Documentation/vm/pin_user_pages.rst. + */ +long pin_longterm_pages_remote(struct task_struct *tsk, struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas, int *locked) +{ + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE(gup_flags & FOLL_GET)) + return -EINVAL; + + /* + * FIXME: as noted in the get_user_pages_remote() implementation, it + * is not yet possible to safely set FOLL_LONGTERM here. FOLL_LONGTERM + * needs to be set, but for now the best we can do is a "TODO" item. + */ + gup_flags |= FOLL_REMOTE | FOLL_PIN; + + return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, + locked, gup_flags); +} +EXPORT_SYMBOL(pin_longterm_pages_remote); + +/** + * pin_user_pages() - pin user pages in memory for use by other devices + * + * Nearly the same as get_user_pages(), except that FOLL_TOUCH is not set, and + * FOLL_PIN is set. + * + * FOLL_PIN means that the pages must be released via put_user_page(). Please + * see Documentation/vm/pin_user_pages.rst for details. + * + * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It + * is NOT intended for Case 2 (RDMA: long-term pins). + */ +long pin_user_pages(unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas) +{ + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE(gup_flags & FOLL_GET)) + return -EINVAL; + + gup_flags |= FOLL_PIN; + return __gup_longterm_locked(current, current->mm, start, nr_pages, + pages, vmas, gup_flags); +} +EXPORT_SYMBOL(pin_user_pages); + +/** + * pin_longterm_pages() - pin user pages in memory for long-term use (RDMA, + * typically) + * + * Nearly the same as get_user_pages(), except that FOLL_PIN and FOLL_LONGTERM + * are set. See get_user_pages_fast() for documentation on the function + * arguments, because the arguments here are identical. + * + * FOLL_PIN means that the pages must be released via put_user_page(). Please + * see Documentation/vm/pin_user_pages.rst for further details. + * + * FOLL_LONGTERM means that the pages are being pinned for "long term" use, + * typically by a non-CPU device, and we cannot be sure that waiting for a + * pinned page to become unpin will be effective. + * + * This is intended for Case 2 (RDMA: long-term pins) in + * Documentation/vm/pin_user_pages.rst. + */ +long pin_longterm_pages(unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas) +{ + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE(gup_flags & FOLL_GET)) + return -EINVAL; + + gup_flags |= FOLL_PIN | FOLL_LONGTERM; + return __gup_longterm_locked(current, current->mm, start, nr_pages, + pages, vmas, gup_flags); +} +EXPORT_SYMBOL(pin_longterm_pages); From patchwork Wed Oct 30 22:49:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220361 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED31B14DB for ; Wed, 30 Oct 2019 22:52:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CBFC22087E for ; Wed, 30 Oct 2019 22:52:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="peTnoPWM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728128AbfJ3Wv4 (ORCPT ); Wed, 30 Oct 2019 18:51:56 -0400 Received: from hqemgate16.nvidia.com ([216.228.121.65]:10853 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727684AbfJ3Wts (ORCPT ); Wed, 30 Oct 2019 18:49:48 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:47 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:42 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:42 -0700 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:41 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:40 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:40 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 06/19] goldish_pipe: convert to pin_user_pages() and put_user_page() Date: Wed, 30 Oct 2019 15:49:17 -0700 Message-ID: <20191030224930.3990755-7-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475787; bh=w1mGkll55W/F61ikcVy7cICFWorcB7JiCtRQJEe0514=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=peTnoPWML8WCJAUu8vvOzOvbfQQErovk+TwLtVZgw6WFjHDzECbbi8VfezEKC0nUb oXVTkAZI3Pql6PtBqLYGZ7aJjNZR3R9OUybtwSpMXJTlJaX5zAO5D7bhEP/IdgL7Gq 2Dsi9XUoaFagZBsoMTaUXUQH1cRFS+2d7LM6ywVFaN4KvOe7841rUdbANPYeyjIz/x yfdslwhoUrsu0jtrilAjuu4ea6C5Rj52zqv1cItlim2f/+U1fa4dft9nx846RvZrS9 whgQbKGIIEIwLKNT+yUksO/PfTG5Tiu3Cyf3wLtvqqD5Ql68s+IbjsZL7KiMqEF83R 1TsgH97tFJpgg== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org 1. Call the new global pin_user_pages_fast(), from pin_goldfish_pages(). 2. As required by pin_user_pages(), release these pages via put_user_page(). In this case, do so via put_user_pages_dirty_lock(). That has the side effect of calling set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate. As Christoph Hellwig put it, "set_page_dirty() is only safe if we are dealing with a file backed page where we have reference on the inode it hangs off." [1] Another side effect is that the release code is simplified because the page[] loop is now in gup.c instead of here, so just delete the local release_user_pages() entirely, and call put_user_pages_dirty_lock() directly, instead. [1] https://lore.kernel.org/r/20190723153640.GB720@lst.de Signed-off-by: John Hubbard --- drivers/platform/goldfish/goldfish_pipe.c | 17 +++-------------- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/drivers/platform/goldfish/goldfish_pipe.c b/drivers/platform/goldfish/goldfish_pipe.c index 7ed2a21a0bac..635a8bc1b480 100644 --- a/drivers/platform/goldfish/goldfish_pipe.c +++ b/drivers/platform/goldfish/goldfish_pipe.c @@ -274,7 +274,7 @@ static int pin_goldfish_pages(unsigned long first_page, *iter_last_page_size = last_page_size; } - ret = get_user_pages_fast(first_page, requested_pages, + ret = pin_user_pages_fast(first_page, requested_pages, !is_write ? FOLL_WRITE : 0, pages); if (ret <= 0) @@ -285,18 +285,6 @@ static int pin_goldfish_pages(unsigned long first_page, return ret; } -static void release_user_pages(struct page **pages, int pages_count, - int is_write, s32 consumed_size) -{ - int i; - - for (i = 0; i < pages_count; i++) { - if (!is_write && consumed_size > 0) - set_page_dirty(pages[i]); - put_page(pages[i]); - } -} - /* Populate the call parameters, merging adjacent pages together */ static void populate_rw_params(struct page **pages, int pages_count, @@ -372,7 +360,8 @@ static int transfer_max_buffers(struct goldfish_pipe *pipe, *consumed_size = pipe->command_buffer->rw_params.consumed_size; - release_user_pages(pipe->pages, pages_count, is_write, *consumed_size); + put_user_pages_dirty_lock(pipe->pages, pages_count, + !is_write && *consumed_size > 0); mutex_unlock(&pipe->lock); return 0; From patchwork Wed Oct 30 22:49:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220341 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EDB0113B1 for ; Wed, 30 Oct 2019 22:51:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B85E22067D for ; Wed, 30 Oct 2019 22:51:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="VYzCcB0x" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728097AbfJ3Wvl (ORCPT ); Wed, 30 Oct 2019 18:51:41 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:1424 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727614AbfJ3Wtv (ORCPT ); Wed, 30 Oct 2019 18:49:51 -0400 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:49 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:43 -0700 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 30 Oct 2019 15:49:43 -0700 Received: from HQMAIL107.nvidia.com (172.20.187.13) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:42 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:42 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:41 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 07/19] infiniband: set FOLL_PIN, FOLL_LONGTERM via pin_longterm_pages*() Date: Wed, 30 Oct 2019 15:49:18 -0700 Message-ID: <20191030224930.3990755-8-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475789; bh=DXHsiJq7l9PXxJGsaitK1WD9ceosiXZuRc8Rt0sYOTo=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=VYzCcB0xgHaGOKtSGQv4L5IsJwx31xRMSf3LX3V6v9Wx3Id206JEMDUUF99xa4w7m Mp14nBvM6DEo/uWK5j5BqH2j5jJrp2/+FQOxvAkkiu8O2lLAvWhk3ZDIW+8Ewf3n9G bQcdQdVvxa4tw7LxTDNkDtP+Q2oHLhhEQAVU+XZyDB6VKqJDdtmE+qT0CPZpoYDBb5 EfFFFawBbjd7cw1gJyyLxoE4xctJxQJtUeovlmWVQxMATDv/KorKOkUPvkgDz4t2Vo DlDxh6OEdO6uLZZzwRtI8IBNXkx8nJqgNH3MmUiTzYfzo1zC60NzPkiaFUDvXEfEIu WAxwtR0kljp6Q== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Convert infiniband to use the new wrapper calls, and stop explicitly setting FOLL_LONGTERM at the call sites. The new pin_longterm_*() calls replace get_user_pages*() calls, and set both FOLL_LONGTERM and a new FOLL_PIN flag. The FOLL_PIN flag requires that the caller must return the pages via put_user_page*() calls, but infiniband was already doing that as part of an earlier commit. Signed-off-by: John Hubbard Reviewed-by: Ira Weiny --- drivers/infiniband/core/umem.c | 5 ++--- drivers/infiniband/core/umem_odp.c | 10 +++++----- drivers/infiniband/hw/hfi1/user_pages.c | 4 ++-- drivers/infiniband/hw/mthca/mthca_memfree.c | 3 +-- drivers/infiniband/hw/qib/qib_user_pages.c | 8 ++++---- drivers/infiniband/hw/qib/qib_user_sdma.c | 2 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 9 ++++----- drivers/infiniband/sw/siw/siw_mem.c | 5 ++--- 8 files changed, 21 insertions(+), 25 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 24244a2f68cc..c5a78d3e674b 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -272,11 +272,10 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, while (npages) { down_read(&mm->mmap_sem); - ret = get_user_pages(cur_base, + ret = pin_longterm_pages(cur_base, min_t(unsigned long, npages, PAGE_SIZE / sizeof (struct page *)), - gup_flags | FOLL_LONGTERM, - page_list, NULL); + gup_flags, page_list, NULL); if (ret < 0) { up_read(&mm->mmap_sem); goto umem_release; diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index 163ff7ba92b7..a38b67b83db5 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -534,7 +534,7 @@ static int ib_umem_odp_map_dma_single_page( } else if (umem_odp->page_list[page_index] == page) { umem_odp->dma_list[page_index] |= access_mask; } else { - pr_err("error: got different pages in IB device and from get_user_pages. IB device page: %p, gup page: %p\n", + pr_err("error: got different pages in IB device and from pin_longterm_pages. IB device page: %p, gup page: %p\n", umem_odp->page_list[page_index], page); /* Better remove the mapping now, to prevent any further * damage. */ @@ -639,11 +639,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, /* * Note: this might result in redundent page getting. We can * avoid this by checking dma_list to be 0 before calling - * get_user_pages. However, this make the code much more - * complex (and doesn't gain us much performance in most use - * cases). + * pin_longterm_pages. However, this makes the code much + * more complex (and doesn't gain us much performance in most + * use cases). */ - npages = get_user_pages_remote(owning_process, owning_mm, + npages = pin_longterm_pages_remote(owning_process, owning_mm, user_virt, gup_num_pages, flags, local_page_list, NULL, NULL); up_read(&owning_mm->mmap_sem); diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c index 469acb961fbd..9b55b0a73e29 100644 --- a/drivers/infiniband/hw/hfi1/user_pages.c +++ b/drivers/infiniband/hw/hfi1/user_pages.c @@ -104,9 +104,9 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np bool writable, struct page **pages) { int ret; - unsigned int gup_flags = FOLL_LONGTERM | (writable ? FOLL_WRITE : 0); + unsigned int gup_flags = (writable ? FOLL_WRITE : 0); - ret = get_user_pages_fast(vaddr, npages, gup_flags, pages); + ret = pin_longterm_pages_fast(vaddr, npages, gup_flags, pages); if (ret < 0) return ret; diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c index edccfd6e178f..beec7e4b8a96 100644 --- a/drivers/infiniband/hw/mthca/mthca_memfree.c +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c @@ -472,8 +472,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar, goto out; } - ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, - FOLL_WRITE | FOLL_LONGTERM, pages); + ret = pin_longterm_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages); if (ret < 0) goto out; diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c index 6bf764e41891..684a14e14d9b 100644 --- a/drivers/infiniband/hw/qib/qib_user_pages.c +++ b/drivers/infiniband/hw/qib/qib_user_pages.c @@ -108,10 +108,10 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages, down_read(¤t->mm->mmap_sem); for (got = 0; got < num_pages; got += ret) { - ret = get_user_pages(start_page + got * PAGE_SIZE, - num_pages - got, - FOLL_LONGTERM | FOLL_WRITE | FOLL_FORCE, - p + got, NULL); + ret = pin_longterm_pages(start_page + got * PAGE_SIZE, + num_pages - got, + FOLL_WRITE | FOLL_FORCE, + p + got, NULL); if (ret < 0) { up_read(¤t->mm->mmap_sem); goto bail_release; diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c b/drivers/infiniband/hw/qib/qib_user_sdma.c index 05190edc2611..fd86a9d19370 100644 --- a/drivers/infiniband/hw/qib/qib_user_sdma.c +++ b/drivers/infiniband/hw/qib/qib_user_sdma.c @@ -670,7 +670,7 @@ static int qib_user_sdma_pin_pages(const struct qib_devdata *dd, else j = npages; - ret = get_user_pages_fast(addr, j, FOLL_LONGTERM, pages); + ret = pin_longterm_pages_fast(addr, j, 0, pages); if (ret != j) { i = 0; j = ret; diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index 62e6ffa9ad78..6b90ca1c3771 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -141,11 +141,10 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable, ret = 0; while (npages) { - ret = get_user_pages(cur_base, - min_t(unsigned long, npages, - PAGE_SIZE / sizeof(struct page *)), - gup_flags | FOLL_LONGTERM, - page_list, NULL); + ret = pin_longterm_pages(cur_base, + min_t(unsigned long, npages, + PAGE_SIZE / sizeof(struct page *)), + gup_flags, page_list, NULL); if (ret < 0) goto out; diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c index e99983f07663..20e663d7ada8 100644 --- a/drivers/infiniband/sw/siw/siw_mem.c +++ b/drivers/infiniband/sw/siw/siw_mem.c @@ -426,9 +426,8 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable) while (nents) { struct page **plist = &umem->page_chunk[i].plist[got]; - rv = get_user_pages(first_page_va, nents, - foll_flags | FOLL_LONGTERM, - plist, NULL); + rv = pin_longterm_pages(first_page_va, nents, + foll_flags, plist, NULL); if (rv < 0) goto out_sem_up; From patchwork Wed Oct 30 22:49:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220381 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 21C7E14DB for ; Wed, 30 Oct 2019 22:52:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E93952087E for ; Wed, 30 Oct 2019 22:52:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="T1okUU6x" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728216AbfJ3WwR (ORCPT ); Wed, 30 Oct 2019 18:52:17 -0400 Received: from hqemgate16.nvidia.com ([216.228.121.65]:10846 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727670AbfJ3Wtr (ORCPT ); Wed, 30 Oct 2019 18:49:47 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:50 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:44 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:44 -0700 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:44 +0000 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:44 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:43 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:43 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , "Mauro Carvalho Chehab" , Michael Ellerman , Michal Hocko , Mike Kravetz , "Paul Mackerras" , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 08/19] mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote() Date: Wed, 30 Oct 2019 15:49:19 -0700 Message-ID: <20191030224930.3990755-9-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475790; bh=mlrZ/rix4TWnvu0gWBO75CbhcMvz3QaPMdjHmepPots=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=T1okUU6xEFiVtVeoQJE96GgMlaT5QHQCTSORP2lhkD8eLigr7DeidXJDkKlLMKjgp suyl0vSyxruTx+5wktlA+v73ygMrBWwhZ3suojNDWt8GwtWUHRbfxxcqePdMFOnqWH 8pF9BGDuXtc+s13A8v+IiYxTtUcRGn3/Fda9cflCAEhL21x9avtCRB0/4GlQ7qoVNT IViDVH9dh3Vl7mArrvAZ3LKr0E9JZPsdHm0+SZQNP+nGbAmrzMotnN6uW5fgYzKzEn VsXqvyhgyE3u3T7p+IDCeRBQTg7e+DvgXTT/NjnEP9pYQZ1j2C/gu/xXXFcdv1lmJj cKzWDdboK7PPQ== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Convert process_vm_access to use the new pin_user_pages_remote() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages. Also, release the pages via put_user_page*(). Also, rename "pages" to "pinned_pages", as this makes for easier reading of process_vm_rw_single_vec(). Signed-off-by: John Hubbard Reviewed-by: Ira Weiny --- mm/process_vm_access.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c index 357aa7bef6c0..fd20ab675b85 100644 --- a/mm/process_vm_access.c +++ b/mm/process_vm_access.c @@ -42,12 +42,11 @@ static int process_vm_rw_pages(struct page **pages, if (copy > len) copy = len; - if (vm_write) { + if (vm_write) copied = copy_page_from_iter(page, offset, copy, iter); - set_page_dirty_lock(page); - } else { + else copied = copy_page_to_iter(page, offset, copy, iter); - } + len -= copied; if (copied < copy && iov_iter_count(iter)) return -EFAULT; @@ -96,7 +95,7 @@ static int process_vm_rw_single_vec(unsigned long addr, flags |= FOLL_WRITE; while (!rc && nr_pages && iov_iter_count(iter)) { - int pages = min(nr_pages, max_pages_per_loop); + int pinned_pages = min(nr_pages, max_pages_per_loop); int locked = 1; size_t bytes; @@ -106,14 +105,15 @@ static int process_vm_rw_single_vec(unsigned long addr, * current/current->mm */ down_read(&mm->mmap_sem); - pages = get_user_pages_remote(task, mm, pa, pages, flags, - process_pages, NULL, &locked); + pinned_pages = pin_user_pages_remote(task, mm, pa, pinned_pages, + flags, process_pages, + NULL, &locked); if (locked) up_read(&mm->mmap_sem); - if (pages <= 0) + if (pinned_pages <= 0) return -EFAULT; - bytes = pages * PAGE_SIZE - start_offset; + bytes = pinned_pages * PAGE_SIZE - start_offset; if (bytes > len) bytes = len; @@ -122,10 +122,12 @@ static int process_vm_rw_single_vec(unsigned long addr, vm_write); len -= bytes; start_offset = 0; - nr_pages -= pages; - pa += pages * PAGE_SIZE; - while (pages) - put_page(process_pages[--pages]); + nr_pages -= pinned_pages; + pa += pinned_pages * PAGE_SIZE; + + /* If vm_write is set, the pages need to be made dirty: */ + put_user_pages_dirty_lock(process_pages, pinned_pages, + vm_write); } return rc; From patchwork Wed Oct 30 22:49:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220383 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECE3414DB for ; Wed, 30 Oct 2019 22:52:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CB7F02087F for ; Wed, 30 Oct 2019 22:52:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="rBaGQRb4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728229AbfJ3WwS (ORCPT ); Wed, 30 Oct 2019 18:52:18 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:1381 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727669AbfJ3Wtr (ORCPT ); Wed, 30 Oct 2019 18:49:47 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:51 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:45 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:45 -0700 Received: from HQMAIL107.nvidia.com (172.20.187.13) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:45 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:44 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:44 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 09/19] drm/via: set FOLL_PIN via pin_user_pages_fast() Date: Wed, 30 Oct 2019 15:49:20 -0700 Message-ID: <20191030224930.3990755-10-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475791; bh=KULLdQxfsT2bXWOpKJJJBCXzN+AAEPF3Hl+zS24dwOk=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=rBaGQRb48zXDoKjfl+INAbxehtvkzCPWhvPWOtjLKNzv/2LXt5ywMx69S+9ql629U oKYp8WKCTlZ9T9FGJVrUkHhErffdc55PxDjRxHG38xC//OXQpsJeo0SD0tIRmX2kXp 2KQpmyip5UpAVZL6Bg0pDfrAPZRr3aoWTSDDkQ4jp5+M8azRSCKtMO+9H3KOG+sK9q MsbgnZ3rEvpahd1TnbI6fWIGmPi9sFC4odC6rlU/HM9vXIiv1AePpRTgeIY0GMlFAO MYeLzf4oiyG0tV7GmQQavrvB0nhBtf3XXcZhR2hthXyKRYb69Vx++cs1RstACfO881 P/pnTsmiXgY+w== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Convert drm/via to use the new pin_user_pages_fast() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages, and therefore for any code that calls put_user_page(). Signed-off-by: John Hubbard Reviewed-by: Ira Weiny Acked-by: Daniel Vetter --- drivers/gpu/drm/via/via_dmablit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/via/via_dmablit.c b/drivers/gpu/drm/via/via_dmablit.c index 3db000aacd26..37c5e572993a 100644 --- a/drivers/gpu/drm/via/via_dmablit.c +++ b/drivers/gpu/drm/via/via_dmablit.c @@ -239,7 +239,7 @@ via_lock_all_dma_pages(drm_via_sg_info_t *vsg, drm_via_dmablit_t *xfer) vsg->pages = vzalloc(array_size(sizeof(struct page *), vsg->num_pages)); if (NULL == vsg->pages) return -ENOMEM; - ret = get_user_pages_fast((unsigned long)xfer->mem_addr, + ret = pin_user_pages_fast((unsigned long)xfer->mem_addr, vsg->num_pages, vsg->direction == DMA_FROM_DEVICE ? FOLL_WRITE : 0, vsg->pages); From patchwork Wed Oct 30 22:49:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220197 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1DE5B17D5 for ; Wed, 30 Oct 2019 22:49:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F0BAF222C6 for ; Wed, 30 Oct 2019 22:49:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="kR+tPouB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727718AbfJ3Wtw (ORCPT ); Wed, 30 Oct 2019 18:49:52 -0400 Received: from hqemgate15.nvidia.com ([216.228.121.64]:5281 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727695AbfJ3Wtu (ORCPT ); Wed, 30 Oct 2019 18:49:50 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:54 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:47 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:47 -0700 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:47 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:45 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:45 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 10/19] fs/io_uring: set FOLL_PIN via pin_user_pages() Date: Wed, 30 Oct 2019 15:49:21 -0700 Message-ID: <20191030224930.3990755-11-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475794; bh=bbL8FR4bzETk+x2+h29SWzIPj5uQjRkW4dL7+03upow=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=kR+tPouBYC0ki827JDMeuNTByE9oIS5IDkovYzuvxCfKraeRE1xhbD+GOuKjda4Em wTplOT5bMiWa8BiaEHL3DKSpG8hqxFlrmCesFeMnbUYq9P1VfB5Aknw2LfXHZadx61 XIn5Hk9/d25I2iGTdlvgsBAJtZ42RCNbPATO9yzUrOBld+IU2VDQsYVK6KL9TsCU6a YxhKY4Qm/Nq4yZ1x82Ms8dt1DPfBsg0RcS/BQFOIl5tkHUes2ShpUz9fJI0YZorwvG K9BJvrNgryOUaUnfVnmo/oNPgkf3tG6AdeuQiOhCRbBpph8T9Ytab2k7nIUs3cE6Z5 zqKFG6B2aYtEA== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Convert fs/io_uring to use the new pin_user_pages() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages, and therefore for any code that calls put_user_page(). Signed-off-by: John Hubbard Reviewed-by: Ira Weiny Reviewed-by: Jens Axboe --- fs/io_uring.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index a30c4f622cb3..d3924b1760eb 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -3431,9 +3431,8 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg, ret = 0; down_read(¤t->mm->mmap_sem); - pret = get_user_pages(ubuf, nr_pages, - FOLL_WRITE | FOLL_LONGTERM, - pages, vmas); + pret = pin_longterm_pages(ubuf, nr_pages, FOLL_WRITE, pages, + vmas); if (pret == nr_pages) { /* don't support file backed memory */ for (j = 0; j < nr_pages; j++) { From patchwork Wed Oct 30 22:49:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220327 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7B93B14DB for ; Wed, 30 Oct 2019 22:51:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 58B7121925 for ; Wed, 30 Oct 2019 22:51:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="fdOxbJQy" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728057AbfJ3Wva (ORCPT ); Wed, 30 Oct 2019 18:51:30 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:1431 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727707AbfJ3Wtw (ORCPT ); Wed, 30 Oct 2019 18:49:52 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:54 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:48 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:48 -0700 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:47 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:47 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:46 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 11/19] net/xdp: set FOLL_PIN via pin_user_pages() Date: Wed, 30 Oct 2019 15:49:22 -0700 Message-ID: <20191030224930.3990755-12-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475794; bh=6zVaU0nwyoWX36/UsrispKaf6jXlsi9COTha1w5b3+c=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=fdOxbJQyxpRW64nrFgwRdhO3mo5SwoQ5q48VuUc9KVQjVcrUXwDJReSwqLqKupuOs doRFQDchRAr17O/KF16ZnzGoeOYxA90xt9ubynWodjLn2YKKrrf9t87Bc77Y9+fKJA VutZgUswKIMmzlIqhlGi/WUcuJneSj5VbhvxTsTxKwRxP+T/K9RAGAn8c4bu6HUsac Mzkv6sVgASiq/XtOISyos/vmKoH3o1RyTsa0/11Vp4RxU0zF8npVToPrcDW2aslO5M OKFSuEXyZLcHuNxFYsdbZ6ejmqTSXcLlx0XX7lwMZTGEE0Wv8/wUrH5DHy2Pvyt6nG q+o8Fflinnl5Q== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Convert net/xdp to use the new pin_longterm_pages() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages. Signed-off-by: John Hubbard Reviewed-by: Ira Weiny Acked-by: Björn Töpel --- net/xdp/xdp_umem.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 16d5f353163a..4d56dfb1139a 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -285,8 +285,8 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem) return -ENOMEM; down_read(¤t->mm->mmap_sem); - npgs = get_user_pages(umem->address, umem->npgs, - gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL); + npgs = pin_longterm_pages(umem->address, umem->npgs, gup_flags, + &umem->pgs[0], NULL); up_read(¤t->mm->mmap_sem); if (npgs != umem->npgs) { From patchwork Wed Oct 30 22:49:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220323 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4AC8613B1 for ; Wed, 30 Oct 2019 22:51:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 13EDE21929 for ; Wed, 30 Oct 2019 22:51:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="hKlTHK/N" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728028AbfJ3WvT (ORCPT ); Wed, 30 Oct 2019 18:51:19 -0400 Received: from hqemgate16.nvidia.com ([216.228.121.65]:10876 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727705AbfJ3Wtw (ORCPT ); Wed, 30 Oct 2019 18:49:52 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:55 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:50 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:50 -0700 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:49 +0000 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:49 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:48 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:48 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , "Mauro Carvalho Chehab" , Michael Ellerman , Michal Hocko , Mike Kravetz , "Paul Mackerras" , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 12/19] mm/gup: track FOLL_PIN pages Date: Wed, 30 Oct 2019 15:49:23 -0700 Message-ID: <20191030224930.3990755-13-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475795; bh=QB4piIeYSHaPr0lrpZOxR61169WQMbaj1LD/Isgokvk=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Type:Content-Transfer-Encoding; b=hKlTHK/NVuiQus97TKJyqcTj1f6CAJWw1pFJrMnif42P4H7Mf0kfc2jVBSSt5cF0G I0fx8nn/hj3tMygwTBzda0fwGJpK1NdsJAaYGHjjmuux7fMTo2gpp0ULcs95R2EwqM 9cEfJnNeNiHbw0SIeDP5XeVqtzy3AUxVJVFFO5gWFYGvKZMXaic1UFr0YrJiaagjKp lSX/YIZSjaYKisC9jZTQ4dksmQoTgGGMmW5FRmAoEfxkPpmswokRi483/0aT6EIQlZ Exd+DWfHU8arMp6cX+Nrcd7EGmrU/AN3KT2HcEhKd4StqwUpFrPdLTtkypuaBOJ8ZX ZWEIANvWEZT/g== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Add tracking of pages that were pinned via FOLL_PIN. As mentioned in the FOLL_PIN documentation, callers who effectively set FOLL_PIN are required to ultimately free such pages via put_user_page(). The effect is similar to FOLL_GET, and may be thought of as "FOLL_GET for DIO and/or RDMA use". Pages that have been pinned via FOLL_PIN are identifiable via a new function call: bool page_dma_pinned(struct page *page); What to do in response to encountering such a page, is left to later patchsets. There is discussion about this in [1]. This also changes a BUG_ON(), to a WARN_ON(), in follow_page_mask(). This also has a couple of trivial, non-functional change fixes to try_get_compound_head(). That function got moved to the top of the file. This includes the following fix from Ira Weiny: DAX requires detection of a page crossing to a ref count of 1. Fix this for GUP pages by introducing put_devmap_managed_user_page() which accounts for GUP_PIN_COUNTING_BIAS now used by GUP. [1] https://lwn.net/Articles/784574/ "Some slow progress on get_user_pages()" Suggested-by: Jan Kara Suggested-by: Jérôme Glisse Signed-off-by: Ira Weiny Signed-off-by: John Hubbard --- include/linux/mm.h | 80 +++++++++++---- include/linux/mmzone.h | 2 + include/linux/page_ref.h | 10 ++ mm/gup.c | 213 +++++++++++++++++++++++++++++++-------- mm/huge_memory.c | 32 +++++- mm/hugetlb.c | 28 ++++- mm/memremap.c | 4 +- mm/vmstat.c | 2 + 8 files changed, 300 insertions(+), 71 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 62c838a3e6c7..882fda919c81 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -972,9 +972,10 @@ static inline bool is_zone_device_page(const struct page *page) #endif #ifdef CONFIG_DEV_PAGEMAP_OPS -void __put_devmap_managed_page(struct page *page); +void __put_devmap_managed_page(struct page *page, int count); DECLARE_STATIC_KEY_FALSE(devmap_managed_key); -static inline bool put_devmap_managed_page(struct page *page) + +static inline bool page_is_devmap_managed(struct page *page) { if (!static_branch_unlikely(&devmap_managed_key)) return false; @@ -983,7 +984,6 @@ static inline bool put_devmap_managed_page(struct page *page) switch (page->pgmap->type) { case MEMORY_DEVICE_PRIVATE: case MEMORY_DEVICE_FS_DAX: - __put_devmap_managed_page(page); return true; default: break; @@ -991,6 +991,19 @@ static inline bool put_devmap_managed_page(struct page *page) return false; } +static inline bool put_devmap_managed_page(struct page *page) +{ + bool is_devmap = page_is_devmap_managed(page); + + if (is_devmap) { + int count = page_ref_dec_return(page); + + __put_devmap_managed_page(page, count); + } + + return is_devmap; +} + #else /* CONFIG_DEV_PAGEMAP_OPS */ static inline bool put_devmap_managed_page(struct page *page) { @@ -1038,6 +1051,8 @@ static inline __must_check bool try_get_page(struct page *page) return true; } +__must_check bool user_page_ref_inc(struct page *page); + static inline void put_page(struct page *page) { page = compound_head(page); @@ -1055,31 +1070,56 @@ static inline void put_page(struct page *page) __put_page(page); } -/** - * put_user_page() - release a gup-pinned page - * @page: pointer to page to be released +/* + * GUP_PIN_COUNTING_BIAS, and the associated functions that use it, overload + * the page's refcount so that two separate items are tracked: the original page + * reference count, and also a new count of how many get_user_pages() calls were + * made against the page. ("gup-pinned" is another term for the latter). + * + * With this scheme, get_user_pages() becomes special: such pages are marked + * as distinct from normal pages. As such, the new put_user_page() call (and + * its variants) must be used in order to release gup-pinned pages. + * + * Choice of value: * - * Pages that were pinned via get_user_pages*() must be released via - * either put_user_page(), or one of the put_user_pages*() routines - * below. This is so that eventually, pages that are pinned via - * get_user_pages*() can be separately tracked and uniquely handled. In - * particular, interactions with RDMA and filesystems need special - * handling. + * By making GUP_PIN_COUNTING_BIAS a power of two, debugging of page reference + * counts with respect to get_user_pages() and put_user_page() becomes simpler, + * due to the fact that adding an even power of two to the page refcount has + * the effect of using only the upper N bits, for the code that counts up using + * the bias value. This means that the lower bits are left for the exclusive + * use of the original code that increments and decrements by one (or at least, + * by much smaller values than the bias value). * - * put_user_page() and put_page() are not interchangeable, despite this early - * implementation that makes them look the same. put_user_page() calls must - * be perfectly matched up with get_user_page() calls. + * Of course, once the lower bits overflow into the upper bits (and this is + * OK, because subtraction recovers the original values), then visual inspection + * no longer suffices to directly view the separate counts. However, for normal + * applications that don't have huge page reference counts, this won't be an + * issue. + * + * Locking: the lockless algorithm described in page_cache_get_speculative() + * and page_cache_gup_pin_speculative() provides safe operation for + * get_user_pages and page_mkclean and other calls that race to set up page + * table entries. */ -static inline void put_user_page(struct page *page) -{ - put_page(page); -} +#define GUP_PIN_COUNTING_BIAS (1UL << 10) +void put_user_page(struct page *page); void put_user_pages_dirty_lock(struct page **pages, unsigned long npages, bool make_dirty); - void put_user_pages(struct page **pages, unsigned long npages); +/** + * page_dma_pinned() - report if a page is pinned by a call to pin_user_pages*() + * or pin_longterm_pages*() + * @page: pointer to page to be queried. + * @Return: True, if it is likely that the page has been "dma-pinned". + * False, if the page is definitely not dma-pinned. + */ +static inline bool page_dma_pinned(struct page *page) +{ + return (page_ref_count(compound_head(page))) >= GUP_PIN_COUNTING_BIAS; +} + #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define SECTION_IN_PAGE_FLAGS #endif diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index bda20282746b..0485cba38d23 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -244,6 +244,8 @@ enum node_stat_item { NR_DIRTIED, /* page dirtyings since bootup */ NR_WRITTEN, /* page writings since bootup */ NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */ + NR_FOLL_PIN_REQUESTED, /* via: pin_user_page(), gup flag: FOLL_PIN */ + NR_FOLL_PIN_RETURNED, /* pages returned via put_user_page() */ NR_VM_NODE_STAT_ITEMS }; diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h index 14d14beb1f7f..b9cbe553d1e7 100644 --- a/include/linux/page_ref.h +++ b/include/linux/page_ref.h @@ -102,6 +102,16 @@ static inline void page_ref_sub(struct page *page, int nr) __page_ref_mod(page, -nr); } +static inline int page_ref_sub_return(struct page *page, int nr) +{ + int ret = atomic_sub_return(nr, &page->_refcount); + + if (page_ref_tracepoint_active(__tracepoint_page_ref_mod)) + __page_ref_mod(page, -nr); + + return ret; +} + static inline void page_ref_inc(struct page *page) { atomic_inc(&page->_refcount); diff --git a/mm/gup.c b/mm/gup.c index 8694bc7b3df3..e51b3820a995 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -29,6 +29,102 @@ struct follow_page_context { unsigned int page_mask; }; +/* + * Return the compound head page with ref appropriately incremented, + * or NULL if that failed. + */ +static inline struct page *try_get_compound_head(struct page *page, int refs) +{ + struct page *head = compound_head(page); + + if (WARN_ON_ONCE(page_ref_count(head) < 0)) + return NULL; + if (unlikely(!page_cache_add_speculative(head, refs))) + return NULL; + return head; +} + +#ifdef CONFIG_DEBUG_VM +static inline void __update_proc_vmstat(struct page *page, + enum node_stat_item item, int count) +{ + mod_node_page_state(page_pgdat(page), item, count); +} +#else +static inline void __update_proc_vmstat(struct page *page, + enum node_stat_item item, int count) +{ +} +#endif + +/** + * user_page_ref_inc() - mark a page as being used by get_user_pages(FOLL_PIN). + * + * @page: pointer to page to be marked + * @Return: true for success, false for failure + */ +__must_check bool user_page_ref_inc(struct page *page) +{ + page = try_get_compound_head(page, GUP_PIN_COUNTING_BIAS); + if (!page) + return false; + + __update_proc_vmstat(page, NR_FOLL_PIN_REQUESTED, 1); + return true; +} + +#ifdef CONFIG_DEV_PAGEMAP_OPS +static bool __put_devmap_managed_user_page(struct page *page) +{ + bool is_devmap = page_is_devmap_managed(page); + + if (is_devmap) { + int count = page_ref_sub_return(page, GUP_PIN_COUNTING_BIAS); + + __update_proc_vmstat(page, NR_FOLL_PIN_RETURNED, 1); + __put_devmap_managed_page(page, count); + } + + return is_devmap; +} +#else +static bool __put_devmap_managed_user_page(struct page *page) +{ + return false; +} +#endif /* CONFIG_DEV_PAGEMAP_OPS */ + +/** + * put_user_page() - release a gup-pinned page + * @page: pointer to page to be released + * + * Pages that were pinned via get_user_pages*() must be released via + * either put_user_page(), or one of the put_user_pages*() routines + * below. This is so that eventually, pages that are pinned via + * get_user_pages*() can be separately tracked and uniquely handled. In + * particular, interactions with RDMA and filesystems need special + * handling. + */ +void put_user_page(struct page *page) +{ + page = compound_head(page); + + /* + * For devmap managed pages we need to catch refcount transition from + * GUP_PIN_COUNTING_BIAS to 1, when refcount reach one it means the + * page is free and we need to inform the device driver through + * callback. See include/linux/memremap.h and HMM for details. + */ + if (__put_devmap_managed_user_page(page)) + return; + + if (page_ref_sub_and_test(page, GUP_PIN_COUNTING_BIAS)) + __put_page(page); + + __update_proc_vmstat(page, NR_FOLL_PIN_RETURNED, 1); +} +EXPORT_SYMBOL(put_user_page); + /** * put_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages * @pages: array of pages to be maybe marked dirty, and definitely released. @@ -215,10 +311,11 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, } page = vm_normal_page(vma, address, pte); - if (!page && pte_devmap(pte) && (flags & FOLL_GET)) { + if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) { /* - * Only return device mapping pages in the FOLL_GET case since - * they are only valid while holding the pgmap reference. + * Only return device mapping pages in the FOLL_GET or FOLL_PIN + * case since they are only valid while holding the pgmap + * reference. */ *pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap); if (*pgmap) @@ -261,6 +358,11 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, page = ERR_PTR(-ENOMEM); goto out; } + } else if (flags & FOLL_PIN) { + if (unlikely(!user_page_ref_inc(page))) { + page = ERR_PTR(-ENOMEM); + goto out; + } } if (flags & FOLL_TOUCH) { if ((flags & FOLL_WRITE) && @@ -522,8 +624,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, /* make this handle hugepd */ page = follow_huge_addr(mm, address, flags & FOLL_WRITE); if (!IS_ERR(page)) { - BUG_ON(flags & FOLL_GET); - return page; + WARN_ON_ONCE(flags & (FOLL_GET | FOLL_PIN)); + return NULL; } pgd = pgd_offset(mm, address); @@ -1824,30 +1926,20 @@ static inline pte_t gup_get_pte(pte_t *ptep) #endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */ static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start, + unsigned int flags, struct page **pages) { while ((*nr) - nr_start) { struct page *page = pages[--(*nr)]; ClearPageReferenced(page); - put_page(page); + if (flags & FOLL_PIN) + put_user_page(page); + else + put_page(page); } } -/* - * Return the compund head page with ref appropriately incremented, - * or NULL if that failed. - */ -static inline struct page *try_get_compound_head(struct page *page, int refs) -{ - struct page *head = compound_head(page); - if (WARN_ON_ONCE(page_ref_count(head) < 0)) - return NULL; - if (unlikely(!page_cache_add_speculative(head, refs))) - return NULL; - return head; -} - #ifdef CONFIG_ARCH_HAS_PTE_SPECIAL static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) @@ -1877,7 +1969,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, pgmap = get_dev_pagemap(pte_pfn(pte), pgmap); if (unlikely(!pgmap)) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(nr, nr_start, flags, pages); goto pte_unmap; } } else if (pte_special(pte)) @@ -1886,9 +1978,15 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); - head = try_get_compound_head(page, 1); - if (!head) - goto pte_unmap; + if (flags & FOLL_PIN) { + head = page; + if (unlikely(!user_page_ref_inc(head))) + goto pte_unmap; + } else { + head = try_get_compound_head(page, 1); + if (!head) + goto pte_unmap; + } if (unlikely(pte_val(pte) != pte_val(*ptep))) { put_page(head); @@ -1942,12 +2040,20 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, pgmap = get_dev_pagemap(pfn, pgmap); if (unlikely(!pgmap)) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(nr, nr_start, flags, pages); return 0; } SetPageReferenced(page); pages[*nr] = page; - get_page(page); + + if (flags & FOLL_PIN) { + if (unlikely(!user_page_ref_inc(page))) { + undo_dev_pagemap(nr, nr_start, flags, pages); + return 0; + } + } else + get_page(page); + (*nr)++; pfn++; } while (addr += PAGE_SIZE, addr != end); @@ -1969,7 +2075,7 @@ static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, return 0; if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(nr, nr_start, flags, pages); return 0; } return 1; @@ -1987,7 +2093,7 @@ static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, return 0; if (unlikely(pud_val(orig) != pud_val(*pudp))) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(nr, nr_start, flags, pages); return 0; } return 1; @@ -2072,9 +2178,16 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, page = head + ((addr & (sz-1)) >> PAGE_SHIFT); refs = __record_subpages(page, addr, end, pages, *nr); - head = try_get_compound_head(head, refs); - if (!head) - return 0; + if (flags & FOLL_PIN) { + head = page; + if (unlikely(!user_page_ref_inc(head))) + return 0; + head = page; + } else { + head = try_get_compound_head(head, refs); + if (!head) + return 0; + } if (unlikely(pte_val(pte) != pte_val(*ptep))) { __remove_refs_from_head(head, refs); @@ -2129,9 +2242,15 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); refs = __record_subpages(page, addr, end, pages, *nr); - head = try_get_compound_head(pmd_page(orig), refs); - if (!head) - return 0; + if (flags & FOLL_PIN) { + head = page; + if (unlikely(!user_page_ref_inc(head))) + return 0; + } else { + head = try_get_compound_head(pmd_page(orig), refs); + if (!head) + return 0; + } if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { __remove_refs_from_head(head, refs); @@ -2160,9 +2279,15 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); refs = __record_subpages(page, addr, end, pages, *nr); - head = try_get_compound_head(pud_page(orig), refs); - if (!head) - return 0; + if (flags & FOLL_PIN) { + head = page; + if (unlikely(!user_page_ref_inc(head))) + return 0; + } else { + head = try_get_compound_head(pud_page(orig), refs); + if (!head) + return 0; + } if (unlikely(pud_val(orig) != pud_val(*pudp))) { __remove_refs_from_head(head, refs); @@ -2186,9 +2311,15 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT); refs = __record_subpages(page, addr, end, pages, *nr); - head = try_get_compound_head(pgd_page(orig), refs); - if (!head) - return 0; + if (flags & FOLL_PIN) { + head = page; + if (unlikely(!user_page_ref_inc(head))) + return 0; + } else { + head = try_get_compound_head(pgd_page(orig), refs); + if (!head) + return 0; + } if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) { __remove_refs_from_head(head, refs); @@ -2414,7 +2545,7 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages, unsigned long addr, len, end; int nr = 0, ret = 0; - if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM))) + if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | FOLL_PIN))) return -EINVAL; start = untagged_addr(start) & PAGE_MASK; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 13cc93785006..66bf4c8b88f1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -945,6 +945,11 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, */ WARN_ONCE(flags & FOLL_COW, "mm: In follow_devmap_pmd with FOLL_COW set"); + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == + (FOLL_PIN | FOLL_GET))) + return NULL; + if (flags & FOLL_WRITE && !pmd_write(*pmd)) return NULL; @@ -960,7 +965,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, * device mapped pages can only be returned if the * caller will manage the page reference count. */ - if (!(flags & FOLL_GET)) + if (!(flags & (FOLL_GET | FOLL_PIN))) return ERR_PTR(-EEXIST); pfn += (addr & ~PMD_MASK) >> PAGE_SHIFT; @@ -968,7 +973,12 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, if (!*pgmap) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); - get_page(page); + + if (flags & FOLL_GET) + get_page(page); + else if (flags & FOLL_PIN) + if (unlikely(!user_page_ref_inc(page))) + page = ERR_PTR(-ENOMEM); return page; } @@ -1088,6 +1098,11 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, if (flags & FOLL_WRITE && !pud_write(*pud)) return NULL; + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == + (FOLL_PIN | FOLL_GET))) + return NULL; + if (pud_present(*pud) && pud_devmap(*pud)) /* pass */; else @@ -1100,7 +1115,7 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, * device mapped pages can only be returned if the * caller will manage the page reference count. */ - if (!(flags & FOLL_GET)) + if (!(flags & (FOLL_GET | FOLL_PIN))) return ERR_PTR(-EEXIST); pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT; @@ -1108,7 +1123,12 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, if (!*pgmap) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); - get_page(page); + + if (flags & FOLL_GET) + get_page(page); + else if (flags & FOLL_PIN) + if (unlikely(!user_page_ref_inc(page))) + page = ERR_PTR(-ENOMEM); return page; } @@ -1522,8 +1542,12 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, skip_mlock: page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page); + if (flags & FOLL_GET) get_page(page); + else if (flags & FOLL_PIN) + if (unlikely(!user_page_ref_inc(page))) + page = NULL; out: return page; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b45a95363a84..da335b1cd798 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4462,7 +4462,17 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, same_page: if (pages) { pages[i] = mem_map_offset(page, pfn_offset); - get_page(pages[i]); + + if (flags & FOLL_GET) + get_page(pages[i]); + else if (flags & FOLL_PIN) + if (unlikely(!user_page_ref_inc(pages[i]))) { + spin_unlock(ptl); + remainder = 0; + err = -ENOMEM; + WARN_ON_ONCE(1); + break; + } } if (vmas) @@ -5022,6 +5032,12 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address, struct page *page = NULL; spinlock_t *ptl; pte_t pte; + + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == + (FOLL_PIN | FOLL_GET))) + return NULL; + retry: ptl = pmd_lockptr(mm, pmd); spin_lock(ptl); @@ -5034,8 +5050,14 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address, pte = huge_ptep_get((pte_t *)pmd); if (pte_present(pte)) { page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT); + if (flags & FOLL_GET) get_page(page); + else if (flags & FOLL_PIN) + if (unlikely(!user_page_ref_inc(page))) { + page = NULL; + goto out; + } } else { if (is_hugetlb_entry_migration(pte)) { spin_unlock(ptl); @@ -5056,7 +5078,7 @@ struct page * __weak follow_huge_pud(struct mm_struct *mm, unsigned long address, pud_t *pud, int flags) { - if (flags & FOLL_GET) + if (flags & (FOLL_GET | FOLL_PIN)) return NULL; return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT); @@ -5065,7 +5087,7 @@ follow_huge_pud(struct mm_struct *mm, unsigned long address, struct page * __weak follow_huge_pgd(struct mm_struct *mm, unsigned long address, pgd_t *pgd, int flags) { - if (flags & FOLL_GET) + if (flags & (FOLL_GET | FOLL_PIN)) return NULL; return pte_page(*(pte_t *)pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT); diff --git a/mm/memremap.c b/mm/memremap.c index 03ccbdfeb697..3b1c69df1d2a 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -410,10 +410,8 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, EXPORT_SYMBOL_GPL(get_dev_pagemap); #ifdef CONFIG_DEV_PAGEMAP_OPS -void __put_devmap_managed_page(struct page *page) +void __put_devmap_managed_page(struct page *page, int count) { - int count = page_ref_dec_return(page); - /* * If refcount is 1 then page is freed and refcount is stable as nobody * holds a reference on the page. diff --git a/mm/vmstat.c b/mm/vmstat.c index 6afc892a148a..65c027d9b637 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1167,6 +1167,8 @@ const char * const vmstat_text[] = { "nr_dirtied", "nr_written", "nr_kernel_misc_reclaimable", + "nr_foll_pin_requested", + "nr_foll_pin_returned", /* enum writeback_stat_item counters */ "nr_dirty_threshold", From patchwork Wed Oct 30 22:49:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220235 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E3C6714DB for ; Wed, 30 Oct 2019 22:50:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C1A0F2190F for ; Wed, 30 Oct 2019 22:50:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="n/jU6RrW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727781AbfJ3WuA (ORCPT ); Wed, 30 Oct 2019 18:50:00 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:1459 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727731AbfJ3Wt4 (ORCPT ); Wed, 30 Oct 2019 18:49:56 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:57 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:51 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:51 -0700 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:50 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:49 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:49 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 13/19] media/v4l2-core: pin_longterm_pages (FOLL_PIN) and put_user_page() conversion Date: Wed, 30 Oct 2019 15:49:24 -0700 Message-ID: <20191030224930.3990755-14-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475797; bh=liO76MrwJT5y5IUfrUHPe1ML08BZFmwb/vxBRrGctqM=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=n/jU6RrW0ZjttNabcpGwQPPab1gIeZFpT4fnLGvVm243QsafiVbh9DI2+2S4wRLXV muZvRf/8uBCxVsC+mzS9tiK6baXFDniJzFsfdLPJcrW9n74Tt2fvUV/p3gtTXpB/mV tvZO4ZDCmJbUhjlTXXb3QYEDe1CQZAWnPCmuKToxabj6E+MYjYQozWYENsHdKPq6A7 L546SbyRShX9dmy+dRTwnDa6eyRVCLbCMk5HVM8PUGjOk3PV94wUozgY11xDMrcKrH PDcINmVzC2eOVqxsHyv4UF+YTfsp0jwhRmqltzQUSz2iYiUp2Sax48itEzhx5AdORO bd9OAN2qf8zWw== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org 1. Change v4l2 from get_user_pages(FOLL_LONGTERM), to pin_longterm_pages(), which sets both FOLL_LONGTERM and FOLL_PIN. 2. Because all FOLL_PIN-acquired pages must be released via put_user_page(), also convert the put_page() call over to put_user_pages_dirty_lock(). Cc: Mauro Carvalho Chehab Signed-off-by: John Hubbard Reviewed-by: Ira Weiny --- drivers/media/v4l2-core/videobuf-dma-sg.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c index 28262190c3ab..9b9c5b37bf59 100644 --- a/drivers/media/v4l2-core/videobuf-dma-sg.c +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c @@ -183,12 +183,12 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma, dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n", data, size, dma->nr_pages); - err = get_user_pages(data & PAGE_MASK, dma->nr_pages, - flags | FOLL_LONGTERM, dma->pages, NULL); + err = pin_longterm_pages(data & PAGE_MASK, dma->nr_pages, + flags, dma->pages, NULL); if (err != dma->nr_pages) { dma->nr_pages = (err >= 0) ? err : 0; - dprintk(1, "get_user_pages: err=%d [%d]\n", err, + dprintk(1, "pin_longterm_pages: err=%d [%d]\n", err, dma->nr_pages); return err < 0 ? err : -EINVAL; } @@ -349,11 +349,8 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma) BUG_ON(dma->sglen); if (dma->pages) { - for (i = 0; i < dma->nr_pages; i++) { - if (dma->direction == DMA_FROM_DEVICE) - set_page_dirty_lock(dma->pages[i]); - put_page(dma->pages[i]); - } + put_user_pages_dirty_lock(dma->pages, dma->nr_pages, + dma->direction == DMA_FROM_DEVICE); kfree(dma->pages); dma->pages = NULL; } From patchwork Wed Oct 30 22:49:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220259 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6B1C814DB for ; Wed, 30 Oct 2019 22:50:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3FEFF21924 for ; Wed, 30 Oct 2019 22:50:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="NIyNWer0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727777AbfJ3Wt7 (ORCPT ); Wed, 30 Oct 2019 18:49:59 -0400 Received: from hqemgate16.nvidia.com ([216.228.121.65]:10895 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727721AbfJ3Wtz (ORCPT ); Wed, 30 Oct 2019 18:49:55 -0400 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:58 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:52 -0700 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 30 Oct 2019 15:49:52 -0700 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:52 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:50 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:50 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 14/19] vfio, mm: pin_longterm_pages (FOLL_PIN) and put_user_page() conversion Date: Wed, 30 Oct 2019 15:49:25 -0700 Message-ID: <20191030224930.3990755-15-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475798; bh=XFjm8RPGYOGaZ8I6ZtqUF2w9I0TfFUQHc/AIPIbYpP8=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=NIyNWer02EnMgKZ9CI4sXwt3Ux9yX4ajp4i5vLHbPveoh8o54s25FzHdtrjaaC5Lf VfosVhiF7ivR2LFLOPHJF45E2C9griCDCZRjJzvkuAwvzJz5oQxrjk1W2FBNv/t05y K9WA8xzVRUfeJTJJYCffPgb4TWKk5kKxW/6FDvZb5VL2fGYJ2P5PWJ7kjzOaJFhGUd XF31UGOtBYD4QLg72qOOtNhZF+bY9kaO0SKRfg3xZP8fvWiZGFBKcdkpbddjVG5j4t 0VemY1URa0sH/nhlOs0AYfqaIAHz3lVPWpgsYIljNs0B6piBSdJ57j0yNKS3kIokdn dWyNgFVoj/U2Q== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org This also fixes one or two likely bugs. 1. Change vfio from get_user_pages(FOLL_LONGTERM), to pin_longterm_pages(), which sets both FOLL_LONGTERM and FOLL_PIN. Note that this is a change in behavior, because the get_user_pages_remote() call was not setting FOLL_LONGTERM, but the new pin_user_pages_remote() call that replaces it, *is* setting FOLL_LONGTERM. It is important to set FOLL_LONGTERM, because the DMA case requires it. Please see the FOLL_PIN documentation in include/linux/mm.h, and Documentation/pin_user_pages.rst for details. 2. Because all FOLL_PIN-acquired pages must be released via put_user_page(), also convert the put_page() call over to put_user_pages(). Note that this effectively changes the code's behavior in vfio_iommu_type1.c: put_pfn(): it now ultimately calls set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate. As Christoph Hellwig put it, "set_page_dirty() is only safe if we are dealing with a file backed page where we have reference on the inode it hangs off." [1] [1] https://lore.kernel.org/r/20190723153640.GB720@lst.de Cc: Alex Williamson Signed-off-by: John Hubbard --- drivers/vfio/vfio_iommu_type1.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index d864277ea16f..795e13f3ef08 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -327,9 +327,8 @@ static int put_pfn(unsigned long pfn, int prot) { if (!is_invalid_reserved_pfn(pfn)) { struct page *page = pfn_to_page(pfn); - if (prot & IOMMU_WRITE) - SetPageDirty(page); - put_page(page); + + put_user_pages_dirty_lock(&page, 1, prot & IOMMU_WRITE); return 1; } return 0; @@ -349,11 +348,11 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, down_read(&mm->mmap_sem); if (mm == current->mm) { - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page, - vmas); + ret = pin_longterm_pages(vaddr, 1, flags, page, vmas); } else { - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, - vmas, NULL); + ret = pin_longterm_pages_remote(NULL, mm, vaddr, 1, + flags, page, vmas, + NULL); /* * The lifetime of a vaddr_get_pfn() page pin is * userspace-controlled. In the fs-dax case this could @@ -363,7 +362,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, */ if (ret > 0 && vma_is_fsdax(vmas[0])) { ret = -EOPNOTSUPP; - put_page(page[0]); + put_user_page(page[0]); } } up_read(&mm->mmap_sem); From patchwork Wed Oct 30 22:49:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220297 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6320F13B1 for ; Wed, 30 Oct 2019 22:51:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2F9BD2190F for ; Wed, 30 Oct 2019 22:51:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="qG/d8CXm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727993AbfJ3WvG (ORCPT ); Wed, 30 Oct 2019 18:51:06 -0400 Received: from hqemgate16.nvidia.com ([216.228.121.65]:10909 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727730AbfJ3Wtz (ORCPT ); Wed, 30 Oct 2019 18:49:55 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:49:59 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:53 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:53 -0700 Received: from HQMAIL107.nvidia.com (172.20.187.13) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:53 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:52 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:51 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 15/19] powerpc: book3s64: convert to pin_longterm_pages() and put_user_page() Date: Wed, 30 Oct 2019 15:49:26 -0700 Message-ID: <20191030224930.3990755-16-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475799; bh=s7fqcPpOnu8FKfGfhgVfAy1waz0rDtbaZTRbQ6blVPc=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=qG/d8CXmRQ3gipI0xCfB+AF3znMbqRBtbSrisMFG1hBwHCaDuQihEdYQXf92GADhH B41r++nRUhS/uyHiDctaxr/nWwnnjUfVqpQfQ4CBEyBBBtDteiqxkjaEBUZJqY60Lj SS1qgoVbIY5nTAejmtzMmICWY3ghFXEyJU00G62teGtMWxf8NFehGpVcye+uQlZJ4X JnKM7x/JtU5w0ztu1osW2m2Oqhx5b0oExjXmdcbqkc51A7xA5/TLwBY3o0eEeCmors FD4vqZjDiCUjsW29+iE0+XfDhX7lYIKV/FGY1ocD0YKU7Gdajz+/xjx0JUrhy+/QrV NtT8saGwtAXhw== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org 1. Convert from get_user_pages(FOLL_LONGTERM) to pin_longterm_pages(). 2. As required by pin_user_pages(), release these pages via put_user_page(). In this case, do so via put_user_pages_dirty_lock(). That has the side effect of calling set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate. As Christoph Hellwig put it, "set_page_dirty() is only safe if we are dealing with a file backed page where we have reference on the inode it hangs off." [1] 3. Release each page in mem->hpages[] (instead of mem->hpas[]), because that is the array that pin_longterm_pages() filled in. This is more accurate and should be a little safer from a maintenance point of view. [1] https://lore.kernel.org/r/20190723153640.GB720@lst.de Signed-off-by: John Hubbard --- arch/powerpc/mm/book3s64/iommu_api.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c index 56cc84520577..69d79cb50d47 100644 --- a/arch/powerpc/mm/book3s64/iommu_api.c +++ b/arch/powerpc/mm/book3s64/iommu_api.c @@ -103,9 +103,8 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, for (entry = 0; entry < entries; entry += chunk) { unsigned long n = min(entries - entry, chunk); - ret = get_user_pages(ua + (entry << PAGE_SHIFT), n, - FOLL_WRITE | FOLL_LONGTERM, - mem->hpages + entry, NULL); + ret = pin_longterm_pages(ua + (entry << PAGE_SHIFT), n, + FOLL_WRITE, mem->hpages + entry, NULL); if (ret == n) { pinned += n; continue; @@ -167,9 +166,8 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, return 0; free_exit: - /* free the reference taken */ - for (i = 0; i < pinned; i++) - put_page(mem->hpages[i]); + /* free the references taken */ + put_user_pages(mem->hpages, pinned); vfree(mem->hpas); kfree(mem); @@ -212,10 +210,9 @@ static void mm_iommu_unpin(struct mm_iommu_table_group_mem_t *mem) if (!page) continue; - if (mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY) - SetPageDirty(page); + put_user_pages_dirty_lock(&mem->hpages[i], 1, + MM_IOMMU_TABLE_GROUP_PAGE_DIRTY); - put_page(page); mem->hpas[i] = 0; } } From patchwork Wed Oct 30 22:49:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220287 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8AA2F1599 for ; Wed, 30 Oct 2019 22:51:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 67BA82067D for ; Wed, 30 Oct 2019 22:51:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="GPoFrGMn" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727958AbfJ3Wuz (ORCPT ); Wed, 30 Oct 2019 18:50:55 -0400 Received: from hqemgate15.nvidia.com ([216.228.121.64]:5321 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727739AbfJ3Wt6 (ORCPT ); Wed, 30 Oct 2019 18:49:58 -0400 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:50:02 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:55 -0700 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 30 Oct 2019 15:49:55 -0700 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:54 +0000 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:54 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:53 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:53 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , "Mauro Carvalho Chehab" , Michael Ellerman , Michal Hocko , Mike Kravetz , "Paul Mackerras" , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 16/19] mm/gup_benchmark: support pin_user_pages() and related calls Date: Wed, 30 Oct 2019 15:49:27 -0700 Message-ID: <20191030224930.3990755-17-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475802; bh=NEkIIi5tFETxHd5VvyaR+PvOubiu9ODuVDN+PrzJrb8=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=GPoFrGMnaDhOWp8GyQn0Iz8SHH7eHN+cbpDC5R0YZAmoeGNuZd0OsUVclAZcYTsUx eVFCUHhVe8xueyA+mhLoZgmFLP56IkOmsdgrT1s1U+XnZykmEFCHBtI7fiXghL8GC2 ec/IufxfUuDdWZcryzf2UGAdIKJLe7FL7rpyU07e1o4X1A1RKullE+UxV6BpDFTdx0 DEk45IxocUaPHg/9Sr0quApcAVrs5yCoflCBukhhGStahSG5y5ckfUzGQLynsl7I3b xg/h/y9ea0tQv6KApVYyw4mmYj35cGXHkf820zky2XbK2xowx+CbUoMIIYZ7W+wuxq u2rc9JLTbaBJw== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Up until now, gup_benchmark supported testing of the following kernel functions: * get_user_pages(): via the '-U' command line option * get_user_pages_longterm(): via the '-L' command line option * get_user_pages_fast(): as the default (no options required) Add test coverage for the new corresponding pin_*() functions: * pin_user_pages(): via the '-c' command line option * pin_longterm_pages(): via the '-b' command line option * pin_user_pages_fast(): via the '-a' command line option Also, add an option for clarity: '-u' for what is now (still) the default choice: get_user_pages_fast(). Also, for the three commands that set FOLL_PIN, verify that the pages really are dma-pinned, via the new is_dma_pinned() routine. Those commands are: PIN_FAST_BENCHMARK : calls pin_user_pages_fast() PIN_LONGTERM_BENCHMARK : calls pin_longterm_pages() PIN_BENCHMARK : calls pin_user_pages() In between the calls to pin_*() and put_user_pages(), check each page: if page_dma_pinned() returns false, then WARN and return. Do this outside of the benchmark timestamps, so that it doesn't affect reported times. Signed-off-by: John Hubbard --- mm/gup_benchmark.c | 74 ++++++++++++++++++++-- tools/testing/selftests/vm/gup_benchmark.c | 23 ++++++- 2 files changed, 91 insertions(+), 6 deletions(-) diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c index 7dd602d7f8db..2bb0f5df4803 100644 --- a/mm/gup_benchmark.c +++ b/mm/gup_benchmark.c @@ -8,6 +8,9 @@ #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) #define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) #define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) +#define PIN_FAST_BENCHMARK _IOWR('g', 4, struct gup_benchmark) +#define PIN_LONGTERM_BENCHMARK _IOWR('g', 5, struct gup_benchmark) +#define PIN_BENCHMARK _IOWR('g', 6, struct gup_benchmark) struct gup_benchmark { __u64 get_delta_usec; @@ -19,6 +22,44 @@ struct gup_benchmark { __u64 expansion[10]; /* For future use */ }; +static void put_back_pages(int cmd, struct page **pages, unsigned long nr_pages) +{ + int i; + + switch (cmd) { + case GUP_FAST_BENCHMARK: + case GUP_LONGTERM_BENCHMARK: + case GUP_BENCHMARK: + for (i = 0; i < nr_pages; i++) + put_page(pages[i]); + break; + + case PIN_FAST_BENCHMARK: + case PIN_LONGTERM_BENCHMARK: + case PIN_BENCHMARK: + put_user_pages(pages, nr_pages); + break; + } +} + +static void verify_dma_pinned(int cmd, struct page **pages, + unsigned long nr_pages) +{ + int i; + + switch (cmd) { + case PIN_FAST_BENCHMARK: + case PIN_LONGTERM_BENCHMARK: + case PIN_BENCHMARK: + for (i = 0; i < nr_pages; i++) { + if (WARN(!page_dma_pinned(pages[i]), + "pages[%d] is NOT dma-pinned\n", i)) + break; + } + break; + } +} + static int __gup_benchmark_ioctl(unsigned int cmd, struct gup_benchmark *gup) { @@ -62,6 +103,19 @@ static int __gup_benchmark_ioctl(unsigned int cmd, nr = get_user_pages(addr, nr, gup->flags & 1, pages + i, NULL); break; + case PIN_FAST_BENCHMARK: + nr = pin_user_pages_fast(addr, nr, gup->flags & 1, + pages + i); + break; + case PIN_LONGTERM_BENCHMARK: + nr = pin_longterm_pages(addr, nr, + (gup->flags & 1), + pages + i, NULL); + break; + case PIN_BENCHMARK: + nr = pin_user_pages(addr, nr, gup->flags & 1, pages + i, + NULL); + break; default: return -1; } @@ -72,15 +126,22 @@ static int __gup_benchmark_ioctl(unsigned int cmd, } end_time = ktime_get(); + /* Shifting the meaning of nr_pages: now it is actual number pinned: */ + nr_pages = i; + gup->get_delta_usec = ktime_us_delta(end_time, start_time); gup->size = addr - gup->addr; + /* + * Take an un-benchmark-timed moment to verify DMA pinned + * state: print a warning if any non-dma-pinned pages are found: + */ + verify_dma_pinned(cmd, pages, nr_pages); + start_time = ktime_get(); - for (i = 0; i < nr_pages; i++) { - if (!pages[i]) - break; - put_page(pages[i]); - } + + put_back_pages(cmd, pages, nr_pages); + end_time = ktime_get(); gup->put_delta_usec = ktime_us_delta(end_time, start_time); @@ -98,6 +159,9 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd, case GUP_FAST_BENCHMARK: case GUP_LONGTERM_BENCHMARK: case GUP_BENCHMARK: + case PIN_FAST_BENCHMARK: + case PIN_LONGTERM_BENCHMARK: + case PIN_BENCHMARK: break; default: return -EINVAL; diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c index 485cf06ef013..c5c934c0f402 100644 --- a/tools/testing/selftests/vm/gup_benchmark.c +++ b/tools/testing/selftests/vm/gup_benchmark.c @@ -18,6 +18,15 @@ #define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) #define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) +/* + * Similar to above, but use FOLL_PIN instead of FOLL_GET. This is done + * by calling pin_user_pages_fast(), pin_longterm_pages(), and pin_user_pages(), + * respectively. + */ +#define PIN_FAST_BENCHMARK _IOWR('g', 4, struct gup_benchmark) +#define PIN_LONGTERM_BENCHMARK _IOWR('g', 5, struct gup_benchmark) +#define PIN_BENCHMARK _IOWR('g', 6, struct gup_benchmark) + struct gup_benchmark { __u64 get_delta_usec; __u64 put_delta_usec; @@ -37,8 +46,17 @@ int main(int argc, char **argv) char *file = "/dev/zero"; char *p; - while ((opt = getopt(argc, argv, "m:r:n:f:tTLUwSH")) != -1) { + while ((opt = getopt(argc, argv, "m:r:n:f:abctTLUuwSH")) != -1) { switch (opt) { + case 'a': + cmd = PIN_FAST_BENCHMARK; + break; + case 'b': + cmd = PIN_LONGTERM_BENCHMARK; + break; + case 'c': + cmd = PIN_BENCHMARK; + break; case 'm': size = atoi(optarg) * MB; break; @@ -60,6 +78,9 @@ int main(int argc, char **argv) case 'U': cmd = GUP_BENCHMARK; break; + case 'u': + cmd = GUP_FAST_BENCHMARK; + break; case 'w': write = 1; break; From patchwork Wed Oct 30 22:49:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220233 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1764F1747 for ; Wed, 30 Oct 2019 22:50:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E91202067D for ; Wed, 30 Oct 2019 22:50:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="XMlwzmls" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727791AbfJ3WuA (ORCPT ); Wed, 30 Oct 2019 18:50:00 -0400 Received: from hqemgate16.nvidia.com ([216.228.121.65]:10933 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727695AbfJ3Wt7 (ORCPT ); Wed, 30 Oct 2019 18:49:59 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:50:02 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:56 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:56 -0700 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:56 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:55 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:54 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 17/19] selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN coverage Date: Wed, 30 Oct 2019 15:49:28 -0700 Message-ID: <20191030224930.3990755-18-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475802; bh=rwvy5scTy5JLg+SKsytZrhoxXGIN7QSORs5+YcV8QqY=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=XMlwzmlsux1x0Db3R9XtBtpzjYXCTvq1+eH82SOkL7qTi9NIeyo2RiVOugrlMnjXp j/blE4LbMu++iqb3TbjZXJt1U/KnPH2iz7i2Q4sFaLCCgAdF8PND1X7rBXqioEEq8Y 1vN8JSqiWFPRV1/5KAyyd/2JtkDJdFak6VIckw/yDbNnHJznIDf2Pm0gLd3fIW5SAo LDRD4oRm1cX1XHRjUB/PWhseuEDhqDXyBVYbFPmMpNl97r8d3/wKZ6uLHzaoEe1wjo ZPUzAHtbjU4zpX5BEyECGbg9F/hknFZvXkKdhvHrRgQVy6Gi0jX5prZZqD/X8QGmxF dREPidPh4E87g== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org It's good to have basic unit test coverage of the new FOLL_PIN behavior. Fortunately, the gup_benchmark unit test is extremely fast (a few milliseconds), so adding it the the run_vmtests suite is going to cause no noticeable change in running time. So, add two new invocations to run_vmtests: 1) Run gup_benchmark with normal get_user_pages(). 2) Run gup_benchmark with pin_user_pages(). This is much like the first call, except that it sets FOLL_PIN. Running these two in quick succession also provide a visual comparison of the running times, which is convenient. The new invocations are fairly early in the run_vmtests script, because with test suites, it's usually preferable to put the shorter, faster tests first, all other things being equal. Signed-off-by: John Hubbard --- tools/testing/selftests/vm/run_vmtests | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests index 951c507a27f7..93e8dc9a7cad 100755 --- a/tools/testing/selftests/vm/run_vmtests +++ b/tools/testing/selftests/vm/run_vmtests @@ -104,6 +104,28 @@ echo "NOTE: The above hugetlb tests provide minimal coverage. Use" echo " https://github.com/libhugetlbfs/libhugetlbfs.git for" echo " hugetlb regression testing." +echo "--------------------------------------------" +echo "running 'gup_benchmark -U' (normal/slow gup)" +echo "--------------------------------------------" +./gup_benchmark -U +if [ $? -ne 0 ]; then + echo "[FAIL]" + exitcode=1 +else + echo "[PASS]" +fi + +echo "------------------------------------------" +echo "running gup_benchmark -c (pin_user_pages)" +echo "------------------------------------------" +./gup_benchmark -c +if [ $? -ne 0 ]; then + echo "[FAIL]" + exitcode=1 +else + echo "[PASS]" +fi + echo "-------------------" echo "running userfaultfd" echo "-------------------" From patchwork Wed Oct 30 22:49:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220261 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DAADF13B1 for ; Wed, 30 Oct 2019 22:50:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AD61B2190F for ; Wed, 30 Oct 2019 22:50:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="IkuRtYB4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727933AbfJ3Wul (ORCPT ); Wed, 30 Oct 2019 18:50:41 -0400 Received: from hqemgate15.nvidia.com ([216.228.121.64]:5340 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727756AbfJ3Wt7 (ORCPT ); Wed, 30 Oct 2019 18:49:59 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:50:04 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:57 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:57 -0700 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:57 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:56 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:55 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 18/19] mm/gup: remove support for gup(FOLL_LONGTERM) Date: Wed, 30 Oct 2019 15:49:29 -0700 Message-ID: <20191030224930.3990755-19-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475804; bh=6eCoZ62YVqPRMh6hmCwd736iryX7QnOeiQKLTn0GI5E=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=IkuRtYB4d3kOJrZ+EENymTxZvsMF+ZD6tsSf6fZglZMnwtzOZ73h77ci85UZpFmUW fQmmlzdC8n8mJLqJjmlbdC3DO4uaZw0YHORB/HafxMGpmFoCkTJ32qpfQg459tX+YK +Gf494GooD+nQM+FC8NLkVHrkhEl1KDPfLs3huMk5X62opwbgjDWtnHAEEMzBGnI99 TD+I6HBpEKDObno8f3eOMuEVqd6qK/ifgd+derpKD20XS8ykxtZtU6JDOQRPdAIVXL nhlxBDkBgZJpnJUcnjsjfG8G3sDLivqNtRrrO3BSwLQ5qluS/cq192ATGynOsvcFqk qSCcSKzrPnw1A== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Now that all other kernel callers of get_user_pages(FOLL_LONGTERM) have been converted to pin_longterm_pages(), lock it down: 1) Add an assertion to get_user_pages(), preventing callers from passing FOLL_LONGTERM (in addition to the existing assertion that prevents FOLL_PIN). 2) Remove the associated GUP_LONGTERM_BENCHMARK test. Signed-off-by: John Hubbard --- mm/gup.c | 8 ++++---- mm/gup_benchmark.c | 9 +-------- tools/testing/selftests/vm/gup_benchmark.c | 7 ++----- 3 files changed, 7 insertions(+), 17 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index e51b3820a995..9a28935a2cb1 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1744,11 +1744,11 @@ long get_user_pages(unsigned long start, unsigned long nr_pages, struct vm_area_struct **vmas) { /* - * As detailed above, FOLL_PIN must only be set internally by the - * pin_user_page*() and pin_longterm_*() APIs, never directly by the - * caller, so enforce that with an assertion: + * As detailed above, FOLL_PIN and FOLL_LONGTERM must only be set + * internally by the pin_user_page*() and pin_longterm_*() APIs, never + * directly by the caller, so enforce that with an assertion: */ - if (WARN_ON_ONCE(gup_flags & FOLL_PIN)) + if (WARN_ON_ONCE(gup_flags & (FOLL_PIN | FOLL_LONGTERM))) return -EINVAL; return __gup_longterm_locked(current, current->mm, start, nr_pages, diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c index 2bb0f5df4803..de6941855b7e 100644 --- a/mm/gup_benchmark.c +++ b/mm/gup_benchmark.c @@ -6,7 +6,7 @@ #include #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) -#define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) +/* Command 2 has been deleted. */ #define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) #define PIN_FAST_BENCHMARK _IOWR('g', 4, struct gup_benchmark) #define PIN_LONGTERM_BENCHMARK _IOWR('g', 5, struct gup_benchmark) @@ -28,7 +28,6 @@ static void put_back_pages(int cmd, struct page **pages, unsigned long nr_pages) switch (cmd) { case GUP_FAST_BENCHMARK: - case GUP_LONGTERM_BENCHMARK: case GUP_BENCHMARK: for (i = 0; i < nr_pages; i++) put_page(pages[i]); @@ -94,11 +93,6 @@ static int __gup_benchmark_ioctl(unsigned int cmd, nr = get_user_pages_fast(addr, nr, gup->flags & 1, pages + i); break; - case GUP_LONGTERM_BENCHMARK: - nr = get_user_pages(addr, nr, - (gup->flags & 1) | FOLL_LONGTERM, - pages + i, NULL); - break; case GUP_BENCHMARK: nr = get_user_pages(addr, nr, gup->flags & 1, pages + i, NULL); @@ -157,7 +151,6 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd, switch (cmd) { case GUP_FAST_BENCHMARK: - case GUP_LONGTERM_BENCHMARK: case GUP_BENCHMARK: case PIN_FAST_BENCHMARK: case PIN_LONGTERM_BENCHMARK: diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c index c5c934c0f402..5ef3cf8f3da5 100644 --- a/tools/testing/selftests/vm/gup_benchmark.c +++ b/tools/testing/selftests/vm/gup_benchmark.c @@ -15,7 +15,7 @@ #define PAGE_SIZE sysconf(_SC_PAGESIZE) #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) -#define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) +/* Command 2 has been deleted. */ #define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) /* @@ -46,7 +46,7 @@ int main(int argc, char **argv) char *file = "/dev/zero"; char *p; - while ((opt = getopt(argc, argv, "m:r:n:f:abctTLUuwSH")) != -1) { + while ((opt = getopt(argc, argv, "m:r:n:f:abctTUuwSH")) != -1) { switch (opt) { case 'a': cmd = PIN_FAST_BENCHMARK; @@ -72,9 +72,6 @@ int main(int argc, char **argv) case 'T': thp = 0; break; - case 'L': - cmd = GUP_LONGTERM_BENCHMARK; - break; case 'U': cmd = GUP_BENCHMARK; break; From patchwork Wed Oct 30 22:49:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11220253 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 264801747 for ; Wed, 30 Oct 2019 22:50:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0438A21924 for ; Wed, 30 Oct 2019 22:50:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="Vl2WHa6m" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727883AbfJ3WuX (ORCPT ); Wed, 30 Oct 2019 18:50:23 -0400 Received: from hqemgate16.nvidia.com ([216.228.121.65]:10956 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727763AbfJ3WuA (ORCPT ); Wed, 30 Oct 2019 18:50:00 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 30 Oct 2019 15:50:04 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 30 Oct 2019 15:49:58 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 30 Oct 2019 15:49:58 -0700 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 30 Oct 2019 22:49:57 +0000 Received: from rnnvemgw01.nvidia.com (10.128.109.123) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 30 Oct 2019 22:49:57 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by rnnvemgw01.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 30 Oct 2019 15:49:57 -0700 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH 19/19] Documentation/vm: add pin_user_pages.rst Date: Wed, 30 Oct 2019 15:49:30 -0700 Message-ID: <20191030224930.3990755-20-jhubbard@nvidia.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191030224930.3990755-1-jhubbard@nvidia.com> References: <20191030224930.3990755-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1572475804; bh=yoaqiDn2Wzt6dGe1BeoJvdxEEws1E68S1Y9yB9OGMfU=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=Vl2WHa6moN80c7LJGMn7iu18Qr+/T3JkBI9wfKiTKbOL4/ZPR/aLLGMH30RuT55Qq Q4bcaWyn43jChG3r5S5iL/0s/rdHfd1WT4xIwhJHHYEuIZM9HEyuRGV399H2ugyCfC KH68iqap+bJDbHZy+J7cc63wAij9Or5EhcDV1zatA69hspEHrvpyOILDhgo+wxffjz HrjRAT+eE5ftqn18EEbTrZ+bVeOqRqMaIkbseGWTN+nsjKNuLMROznKs686wPHjYKO VsKdDU62rZPEdAtwmfBj/ZN5nHLjOS6jSiASRPIUssa1DfpWJx6BOFscfy9IXsaBXD FhwSiOqM5FkUA== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Document the new pin_user_pages() and related calls and behavior. Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases in this documentation. (I've reworded it and expanded on it slightly.) Cc: Jonathan Corbet Signed-off-by: John Hubbard --- Documentation/vm/index.rst | 1 + Documentation/vm/pin_user_pages.rst | 213 ++++++++++++++++++++++++++++ 2 files changed, 214 insertions(+) create mode 100644 Documentation/vm/pin_user_pages.rst diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst index e8d943b21cf9..7194efa3554a 100644 --- a/Documentation/vm/index.rst +++ b/Documentation/vm/index.rst @@ -44,6 +44,7 @@ descriptions of data structures and algorithms. page_migration page_frags page_owner + pin_user_pages remap_file_pages slub split_page_table_lock diff --git a/Documentation/vm/pin_user_pages.rst b/Documentation/vm/pin_user_pages.rst new file mode 100644 index 000000000000..7110bca3f188 --- /dev/null +++ b/Documentation/vm/pin_user_pages.rst @@ -0,0 +1,213 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================================================== +pin_user_pages() and related calls +==================================================== + +.. contents:: :local: + +Overview +======== + +This document describes the following functions: :: + + pin_user_pages + pin_user_pages_fast + pin_user_pages_remote + + pin_longterm_pages + pin_longterm_pages_fast + pin_longterm_pages_remote + +Basic description of FOLL_PIN +============================= + +A new flag for get_user_pages ("gup") has been added: FOLL_PIN. FOLL_PIN has +significant interactions and interdependencies with FOLL_LONGTERM, so both are +covered here. + +Both FOLL_PIN and FOLL_LONGTERM are "internal" to gup, meaning that neither +FOLL_PIN nor FOLL_LONGTERM should not appear at the gup call sites. This allows +the associated wrapper functions (pin_user_pages and others) to set the correct +combination of these flags, and to check for problems as well. + +FOLL_PIN and FOLL_GET are mutually exclusive for a given gup call. However, +multiple threads and call sites are free to pin the same struct pages, via both +FOLL_PIN and FOLL_GET. It's just the call site that needs to choose one or the +other, not the struct page(s). + +The FOLL_PIN implementation is nearly the same as FOLL_GET, except that FOLL_PIN +uses a different reference counting technique. + +FOLL_PIN is a prerequisite to FOLL_LONGTGERM. Another way of saying that is, +FOLL_LONGTERM is a specific case, more restrictive case of FOLL_PIN. + +Which flags are set by each wrapper +=================================== + +Only FOLL_PIN and FOLL_LONGTERM are covered here. These flags are added to +whatever flags the caller provides:: + + Function gup flags (FOLL_PIN or FOLL_LONGTERM only) + -------- ------------------------------------------ + pin_user_pages FOLL_PIN + pin_user_pages_fast FOLL_PIN + pin_user_pages_remote FOLL_PIN + + pin_longterm_pages FOLL_PIN | FOLL_LONGTERM + pin_longterm_pages_fast FOLL_PIN | FOLL_LONGTERM + pin_longterm_pages_remote FOLL_PIN | FOLL_LONGTERM + +Tracking dma-pinned pages +========================= + +Some of the key design constraints, and solutions, for tracking dma-pinned +pages: + +* An actual reference count, per struct page, is required. This is because + multiple processes may pin and unpin a page. + +* False positives (reporting that a page is dma-pinned, when in fact it is not) + are acceptable, but false negatives are not. + +* struct page may not be increased in size for this, and all fields are already + used. + +* Given the above, we can overload the page->_refcount field by using, sort of, + the upper bits in that field for a dma-pinned count. "Sort of", means that, + rather than dividing page->_refcount into bit fields, we simple add a medium- + large value (GUP_PIN_COUNTING_BIAS, initially chosen to be 1024: 10 bits) to + page->_refcount. This provides fuzzy behavior: if a page has get_page() called + on it 1024 times, then it will appear to have a single dma-pinned count. + And again, that's acceptable. + +This also leads to limitations: there are only 32-10==22 bits available for a +counter that increments 10 bits at a time. + +TODO: for 1GB and larger huge pages, this is cutting it close. That's because +when pin_user_pages() follows such pages, it increments the head page by "1" +(where "1" used to mean "+1" for get_user_pages(), but now means "+1024" for +pin_user_pages()) for each tail page. So if you have a 1GB huge page: + +* There are 256K (18 bits) worth of 4 KB tail pages. +* There are 22 bits available to count up via GUP_PIN_COUNTING_BIAS (that is, + 10 bits at a time) +* There are 22 - 18 == 4 bits available to count. Except that there aren't, + because you need to allow for a few normal get_page() calls on the head page, + as well. Fortunately, the approach of using addition, rather than "hard" + bitfields, within page->_refcount, allows for sharing these bits gracefully. + But we're still looking at about 16 references. + +This, however, is a missing feature more than anything else, because it's easily +solved by addressing an obvious inefficiency in the original get_user_pages() +approach of retrieving pages: stop treating all the pages as if they were +PAGE_SIZE. Retrieve huge pages as huge pages. The callers need to be aware of +this, so some work is required. Once that's in place, this limitation mostly +disappears from view, because there will be ample refcounting range available. + +* Callers must specifically request "dma-pinned tracking of pages". In other + words, just calling get_user_pages() will not suffice; a new set of functions, + pin_user_page() and related, must be used. + +FOLL_PIN, FOLL_GET, FOLL_LONGTERM: when to use which flags +========================================================== + +Thanks to Jan Kara, Vlastimil Babka and several other -mm people, for describing +these categories: + +CASE 1: Direct IO (DIO) +----------------------- +There are GUP references to pages that are serving +as DIO buffers. These buffers are needed for a relatively short time (so they +are not "long term"). No special synchronization with page_mkclean() or +munmap() is provided. Therefore, flags to set at the call site are: :: + + FOLL_PIN + +...but rather than setting FOLL_PIN directly, call sites should use one of +the pin_user_pages*() routines that set FOLL_PIN. + +CASE 2: RDMA +------------ +There are GUP references to pages that are serving as DMA +buffers. These buffers are needed for a long time ("long term"). No special +synchronization with page_mkclean() or munmap() is provided. Therefore, flags +to set at the call site are: :: + + FOLL_PIN | FOLL_LONGTERM + +TODO: There is also a special case when the pages are DAX pages: in addition to +the above flags, the caller needs something like a layout lease on the +associated file. This is yet to be implemented. When it is implemented, it's +expected that the lease will be a prerequisite to setting FOLL_LONGTERM. + +CASE 3: ODP +----------- +(Mellanox/Infiniband On Demand Paging: the hardware supports +replayable page faulting). There are GUP references to pages serving as DMA +buffers. For ODP, MMU notifiers are used to synchronize with page_mkclean() +and munmap(). Therefore, normal GUP calls are sufficient, so neither flag +needs to be set. + +CASE 4: Pinning for struct page manipulation only +------------------------------------------------- +Here, normal GUP calls are sufficient, so neither flag needs to be set. + +page_dma_pinned(): the whole point of pinning +============================================= + +The whole point of marking pages as "DMA-pinned" or "gup-pinned" is to be able +to query, "is this page DMA-pinned?" That allows code such as page_mkclean() +(and file system writeback code in general) to make informed decisions about +what to do when a page cannot be unmapped due to such pins. + +What to do in those cases is the subject of a years-long series of discussions +and debates (see the References at the end of this document). It's a TODO item +here: fill in the details once that's worked out. Meanwhile, it's safe to say +that having this available: :: + + static inline bool page_dma_pinned(struct page *page) + +...is a prerequisite to solving the long-running gup+DMA problem. + +Another way of thinking about FOLL_GET, FOLL_PIN, and FOLL_LONGTERM +=================================================================== + +Another way of thinking about these flags is as a progression of restrictions: +FOLL_GET is for struct page manipulation, without affecting the data that the +struct page refers to. FOLL_PIN is a *replacement* for FOLL_GET, and is for +short term pins on pages whose data *will* get accessed. As such, FOLL_PIN is +a "more severe" form of pinning. And finally, FOLL_LONGTERM is an even more +restrictive case that has FOLL_PIN as a prerequisite: this is for pages that +will be pinned longterm, and whose data will be accessed. + +Unit testing +============ +This file:: + + tools/testing/selftests/vm/gup_benchmark.c + +has the following new calls to exercise the new pin*() wrapper functions: + +* PIN_FAST_BENCHMARK (./gup_benchmark -a) +* PIN_LONGTERM_BENCHMARK (./gup_benchmark -a) +* PIN_BENCHMARK (./gup_benchmark -a) + +You can monitor how many total dma-pinned pages have been acquired and released +since the system was booted, via two new /proc/vmstat entries: :: + + /proc/vmstat/nr_foll_pin_requested + /proc/vmstat/nr_foll_pin_requested + +Those are both going to show zero, unless CONFIG_DEBUG_VM is set. This is +because there is a noticeable performance drop in put_user_page(), when they +are activated. + +References +========== + +* `Some slow progress on get_user_pages() (Apr 2, 2019) `_ +* `DMA and get_user_pages() (LPC: Dec 12, 2018) `_ +* `The trouble with get_user_pages() (Apr 30, 2018) `_ + +John Hubbard, October, 2019