From patchwork Thu Nov 21 07:13:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255623 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8F264112B for ; Thu, 21 Nov 2019 07:16:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6E7FD208A3 for ; Thu, 21 Nov 2019 07:16:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="gMl5VQep" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726613AbfKUHP1 (ORCPT ); Thu, 21 Nov 2019 02:15:27 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:4375 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727298AbfKUHOH (ORCPT ); Thu, 21 Nov 2019 02:14:07 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:57 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:56 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 20 Nov 2019 23:13:56 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:55 +0000 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:55 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:55 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:55 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , "Mauro Carvalho Chehab" , Michael Ellerman , Michal Hocko , Mike Kravetz , "Paul Mackerras" , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard , "Kirill A . Shutemov" Subject: [PATCH v7 01/24] mm/gup: pass flags arg to __gup_device_* functions Date: Wed, 20 Nov 2019 23:13:31 -0800 Message-ID: <20191121071354.456618-2-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320437; bh=shQdk3aSV65MVp9nCFIR9V5hifAIbJq1aZuVoQR8zJ0=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Type:Content-Transfer-Encoding; b=gMl5VQepYNVpBXofZza0/ow/8Dn935aB6KBsCZ3lc9TtHFCtZyjmhmWYKFH6IWFRl TnRRGXEo+FQ1av+GEsBdTTibE5RnfRI0tvxbsmRxdI4PCW3jO/hquda7Nqn64ZXawG T+V/mOygdWd7MqVqN4z8k9KRlTyE13FGZfZ/+PDzYWF+aFUwKAGCLnX7kwEmv54wid 3MAjQlavR9EuuEsEgQ/PitjNtZMvDTpGftK/rWJbNiEidCFo0rrQT4pTRQDQusw5qt gcHbrtSkK7sCuZHREU4ExJ34LPx8dY52thRmUp1xUZrjzb41EgN4GTlQH6yk+ltaIr FsssYdqLifkmg== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org A subsequent patch requires access to gup flags, so pass the flags argument through to the __gup_device_* functions. Also placate checkpatch.pl by shortening a nearby line. Reviewed-by: Jan Kara Reviewed-by: Jérôme Glisse Reviewed-by: Ira Weiny Cc: Kirill A. Shutemov Signed-off-by: John Hubbard --- mm/gup.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 8f236a335ae9..85caf76b3012 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1890,7 +1890,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE) static int __gup_device_huge(unsigned long pfn, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { int nr_start = *nr; struct dev_pagemap *pgmap = NULL; @@ -1916,13 +1917,14 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, } static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { unsigned long fault_pfn; int nr_start = *nr; fault_pfn = pmd_pfn(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - if (!__gup_device_huge(fault_pfn, addr, end, pages, nr)) + if (!__gup_device_huge(fault_pfn, addr, end, flags, pages, nr)) return 0; if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { @@ -1933,13 +1935,14 @@ static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, } static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { unsigned long fault_pfn; int nr_start = *nr; fault_pfn = pud_pfn(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); - if (!__gup_device_huge(fault_pfn, addr, end, pages, nr)) + if (!__gup_device_huge(fault_pfn, addr, end, flags, pages, nr)) return 0; if (unlikely(pud_val(orig) != pud_val(*pudp))) { @@ -1950,14 +1953,16 @@ static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, } #else static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { BUILD_BUG(); return 0; } static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { BUILD_BUG(); return 0; @@ -2062,7 +2067,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, if (pmd_devmap(orig)) { if (unlikely(flags & FOLL_LONGTERM)) return 0; - return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr); + return __gup_device_huge_pmd(orig, pmdp, addr, end, flags, + pages, nr); } refs = 0; @@ -2092,7 +2098,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, } static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, - unsigned long end, unsigned int flags, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { struct page *head, *page; int refs; @@ -2103,7 +2110,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, if (pud_devmap(orig)) { if (unlikely(flags & FOLL_LONGTERM)) return 0; - return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr); + return __gup_device_huge_pud(orig, pudp, addr, end, flags, + pages, nr); } refs = 0; From patchwork Thu Nov 21 07:13:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255705 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 746D1930 for ; Thu, 21 Nov 2019 07:17:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 475272089F for ; Thu, 21 Nov 2019 07:17:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="AABiJjmz" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727176AbfKUHRK (ORCPT ); Thu, 21 Nov 2019 02:17:10 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:4296 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727173AbfKUHOD (ORCPT ); Thu, 21 Nov 2019 02:14:03 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:57 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:56 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 20 Nov 2019 23:13:56 -0800 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:55 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:55 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:55 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard , Christoph Hellwig , "Aneesh Kumar K . V" Subject: [PATCH v7 02/24] mm/gup: factor out duplicate code from four routines Date: Wed, 20 Nov 2019 23:13:32 -0800 Message-ID: <20191121071354.456618-3-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320437; bh=sKvmWb988RM84yP+egQhSiN41dObVjIOUVoMHE//spE=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Type:Content-Transfer-Encoding; b=AABiJjmzLHly90Lc+tpSgWvQ2gkOLBSYLx+q1zdjNh8KcQhGmFmsHnn0Y5BGHpRJD lgauCr1HVm47H+VZMReB1beCsG5JEAaltjH32A42RpfeXBABxRH7jysaeIpaogLDkZ 8vVFkAkRSmHW4rTeElrnSY8BxawMss/QQfrXR6NVDNG3J/EiAwhHBmoa3hsNalR0R1 NEox93K3NNzsOwo/F3iWRgwraeHBU6qhr8MtkgZIHumtbLgVhvBfHBC2cOT7xMTeSJ FBTqNvemx2QSN1/9cmZo/T+p9xWcNyjvq/uK2CE2cqAq0m4K4xzqaaPXYYOE8SdXxC zlcdC2LZ1r8GQ== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org There are four locations in gup.c that have a fair amount of code duplication. This means that changing one requires making the same changes in four places, not to mention reading the same code four times, and wondering if there are subtle differences. Factor out the common code into static functions, thus reducing the overall line count and the code's complexity. Also, take the opportunity to slightly improve the efficiency of the error cases, by doing a mass subtraction of the refcount, surrounded by get_page()/put_page(). Also, further simplify (slightly), by waiting until the the successful end of each routine, to increment *nr. Reviewed-by: Jérôme Glisse Reviewed-by: Jan Kara Cc: Ira Weiny Cc: Christoph Hellwig Cc: Aneesh Kumar K.V Signed-off-by: John Hubbard Reviewed-by: Christoph Hellwig --- mm/gup.c | 91 ++++++++++++++++++++++---------------------------------- 1 file changed, 36 insertions(+), 55 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 85caf76b3012..f3c7d6625817 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1969,6 +1969,25 @@ static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr, } #endif +static int __record_subpages(struct page *page, unsigned long addr, + unsigned long end, struct page **pages) +{ + int nr; + + for (nr = 0; addr != end; addr += PAGE_SIZE) + pages[nr++] = page++; + + return nr; +} + +static void put_compound_head(struct page *page, int refs) +{ + /* Do a get_page() first, in case refs == page->_refcount */ + get_page(page); + page_ref_sub(page, refs); + put_page(page); +} + #ifdef CONFIG_ARCH_HAS_HUGEPD static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end, unsigned long sz) @@ -1998,32 +2017,20 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, /* hugepages are never "special" */ VM_BUG_ON(!pfn_valid(pte_pfn(pte))); - refs = 0; head = pte_page(pte); - page = head + ((addr & (sz-1)) >> PAGE_SHIFT); - do { - VM_BUG_ON(compound_head(page) != head); - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); + refs = __record_subpages(page, addr, end, pages + *nr); head = try_get_compound_head(head, refs); - if (!head) { - *nr -= refs; + if (!head) return 0; - } if (unlikely(pte_val(pte) != pte_val(*ptep))) { - /* Could be optimized better */ - *nr -= refs; - while (refs--) - put_page(head); + put_compound_head(head, refs); return 0; } + *nr += refs; SetPageReferenced(head); return 1; } @@ -2071,28 +2078,19 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, pages, nr); } - refs = 0; page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - do { - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); + refs = __record_subpages(page, addr, end, pages + *nr); head = try_get_compound_head(pmd_page(orig), refs); - if (!head) { - *nr -= refs; + if (!head) return 0; - } if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { - *nr -= refs; - while (refs--) - put_page(head); + put_compound_head(head, refs); return 0; } + *nr += refs; SetPageReferenced(head); return 1; } @@ -2114,28 +2112,19 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, pages, nr); } - refs = 0; page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); - do { - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); + refs = __record_subpages(page, addr, end, pages + *nr); head = try_get_compound_head(pud_page(orig), refs); - if (!head) { - *nr -= refs; + if (!head) return 0; - } if (unlikely(pud_val(orig) != pud_val(*pudp))) { - *nr -= refs; - while (refs--) - put_page(head); + put_compound_head(head, refs); return 0; } + *nr += refs; SetPageReferenced(head); return 1; } @@ -2151,28 +2140,20 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, return 0; BUILD_BUG_ON(pgd_devmap(orig)); - refs = 0; + page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT); - do { - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); + refs = __record_subpages(page, addr, end, pages + *nr); head = try_get_compound_head(pgd_page(orig), refs); - if (!head) { - *nr -= refs; + if (!head) return 0; - } if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) { - *nr -= refs; - while (refs--) - put_page(head); + put_compound_head(head, refs); return 0; } + *nr += refs; SetPageReferenced(head); return 1; } From patchwork Thu Nov 21 07:13:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255399 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5CB6E186D for ; Thu, 21 Nov 2019 07:13:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3CF70208CE for ; Thu, 21 Nov 2019 07:13:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="GljxL876" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727071AbfKUHN6 (ORCPT ); Thu, 21 Nov 2019 02:13:58 -0500 Received: from hqemgate15.nvidia.com ([216.228.121.64]:17691 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725842AbfKUHN6 (ORCPT ); Thu, 21 Nov 2019 02:13:58 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:52 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:55 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 20 Nov 2019 23:13:55 -0800 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:55 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:55 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:55 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH v7 03/24] mm/gup: move try_get_compound_head() to top, fix minor issues Date: Wed, 20 Nov 2019 23:13:33 -0800 Message-ID: <20191121071354.456618-4-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320432; bh=diV8s1yuXD4Hgu8vlHZLpe4KN2/SH1maxCg7V19wHIM=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=GljxL876+gvuLSjDsLvdnPqOxClI/Ba5thrKXPQx/ixqgcNRfKAZDpBQQ2os5Z7YC 7cL5zxn16JCiVQWj1ZTXK02HfMhRRPVENV9RsJZTDhO9BgwdWgiIUrazdv0CXTEbqw cEbCbcq7N8TczqO9rmLK5EQCLbtRCWXyqFrbr63Non+f3K14n+gEOHD1Rm0dtyh3Uh qDlCq399rKL3l2KPteTmRmTotzQnwBcvXn2y6BgAM8jxLiXMHE4QFqGJJCLTcDmUM5 OXMe49djzmAEEiiP7N+fQPsboKLOfSn++O/7fPnoJ8ECqzcZn5xgKwmXKO7rmFMSMz 2NTGr/KHCuFGQ== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org An upcoming patch uses try_get_compound_head() more widely, so move it to the top of gup.c. Also fix a tiny spelling error and a checkpatch.pl warning. Reviewed-by: Jan Kara Reviewed-by: Ira Weiny Signed-off-by: John Hubbard Reviewed-by: Christoph Hellwig --- mm/gup.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index f3c7d6625817..14fcdc502166 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -29,6 +29,21 @@ struct follow_page_context { unsigned int page_mask; }; +/* + * Return the compound head page with ref appropriately incremented, + * or NULL if that failed. + */ +static inline struct page *try_get_compound_head(struct page *page, int refs) +{ + struct page *head = compound_head(page); + + if (WARN_ON_ONCE(page_ref_count(head) < 0)) + return NULL; + if (unlikely(!page_cache_add_speculative(head, refs))) + return NULL; + return head; +} + /** * put_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages * @pages: array of pages to be maybe marked dirty, and definitely released. @@ -1793,20 +1808,6 @@ static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start, } } -/* - * Return the compund head page with ref appropriately incremented, - * or NULL if that failed. - */ -static inline struct page *try_get_compound_head(struct page *page, int refs) -{ - struct page *head = compound_head(page); - if (WARN_ON_ONCE(page_ref_count(head) < 0)) - return NULL; - if (unlikely(!page_cache_add_speculative(head, refs))) - return NULL; - return head; -} - #ifdef CONFIG_ARCH_HAS_PTE_SPECIAL static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) From patchwork Thu Nov 21 07:13:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255621 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 27F3014E5 for ; Thu, 21 Nov 2019 07:16:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 081E7208A3 for ; Thu, 21 Nov 2019 07:16:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="gfwggRsJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726875AbfKUHP2 (ORCPT ); Thu, 21 Nov 2019 02:15:28 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:4377 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727297AbfKUHOH (ORCPT ); Thu, 21 Nov 2019 02:14:07 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:57 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:56 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 20 Nov 2019 23:13:56 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:55 +0000 Received: from HQMAIL107.nvidia.com (172.20.187.13) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:55 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:55 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:55 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , "Mauro Carvalho Chehab" , Michael Ellerman , Michal Hocko , Mike Kravetz , "Paul Mackerras" , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard , "Christoph Hellwig" Subject: [PATCH v7 04/24] mm: Cleanup __put_devmap_managed_page() vs ->page_free() Date: Wed, 20 Nov 2019 23:13:34 -0800 Message-ID: <20191121071354.456618-5-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320438; bh=tsex47R2BQW/HRhxz4q5zTudp/GozzJHGnDF0/waCbA=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Type:Content-Transfer-Encoding; b=gfwggRsJX15HyLSxiO9JeEm1RWoATKK265yUsN85e8iLy27pg6fyKg9KrtNqpkKHB pYxMmtdlGr8fm55K3Lo7oiUUqMuqZdT2wTKMfESw00kzlUkmLBAMM0N09bGPTMTXZo qUUB9amn+kirK0+cMEBfzReIUodICUm5NBnecIpYgQww89jxbaXcq6ZeMZSiZQ5PqZ vVpn2BOVQrstMRjhkxBCQcUbFHE3abRCKbvjDyxLtLWZszMkxtIvQ0YsnATbK4MyII wH6MEWLkC+vQaHZGt1P24EC/HT0NhvDY9b0ibJ9NV53mmbU05ZDtJX/RiQ5IZOrFih FzYpBNJXrBOzg== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Dan Williams After the removal of the device-public infrastructure there are only 2 ->page_free() call backs in the kernel. One of those is a device-private callback in the nouveau driver, the other is a generic wakeup needed in the DAX case. In the hopes that all ->page_free() callbacks can be migrated to common core kernel functionality, move the device-private specific actions in __put_devmap_managed_page() under the is_device_private_page() conditional, including the ->page_free() callback. For the other page types just open-code the generic wakeup. Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA case. Cc: Jan Kara Cc: Christoph Hellwig Cc: Ira Weiny Reviewed-by: Christoph Hellwig Reviewed-by: Jérôme Glisse Signed-off-by: Dan Williams Signed-off-by: John Hubbard Reviewed-by: Christoph Hellwig --- drivers/nvdimm/pmem.c | 6 ---- mm/memremap.c | 80 ++++++++++++++++++++++++------------------- 2 files changed, 44 insertions(+), 42 deletions(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index f9f76f6ba07b..21db1ce8c0ae 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem) put_disk(pmem->disk); } -static void pmem_pagemap_page_free(struct page *page) -{ - wake_up_var(&page->_refcount); -} - static const struct dev_pagemap_ops fsdax_pagemap_ops = { - .page_free = pmem_pagemap_page_free, .kill = pmem_pagemap_kill, .cleanup = pmem_pagemap_cleanup, }; diff --git a/mm/memremap.c b/mm/memremap.c index 03ccbdfeb697..e899fa876a62 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -27,7 +27,8 @@ static void devmap_managed_enable_put(void) static int devmap_managed_enable_get(struct dev_pagemap *pgmap) { - if (!pgmap->ops || !pgmap->ops->page_free) { + if (pgmap->type == MEMORY_DEVICE_PRIVATE && + (!pgmap->ops || !pgmap->ops->page_free)) { WARN(1, "Missing page_free method\n"); return -EINVAL; } @@ -414,44 +415,51 @@ void __put_devmap_managed_page(struct page *page) { int count = page_ref_dec_return(page); - /* - * If refcount is 1 then page is freed and refcount is stable as nobody - * holds a reference on the page. - */ - if (count == 1) { - /* Clear Active bit in case of parallel mark_page_accessed */ - __ClearPageActive(page); - __ClearPageWaiters(page); + /* still busy */ + if (count > 1) + return; - mem_cgroup_uncharge(page); + /* only triggered by the dev_pagemap shutdown path */ + if (count == 0) { + __put_page(page); + return; + } - /* - * When a device_private page is freed, the page->mapping field - * may still contain a (stale) mapping value. For example, the - * lower bits of page->mapping may still identify the page as - * an anonymous page. Ultimately, this entire field is just - * stale and wrong, and it will cause errors if not cleared. - * One example is: - * - * migrate_vma_pages() - * migrate_vma_insert_page() - * page_add_new_anon_rmap() - * __page_set_anon_rmap() - * ...checks page->mapping, via PageAnon(page) call, - * and incorrectly concludes that the page is an - * anonymous page. Therefore, it incorrectly, - * silently fails to set up the new anon rmap. - * - * For other types of ZONE_DEVICE pages, migration is either - * handled differently or not done at all, so there is no need - * to clear page->mapping. - */ - if (is_device_private_page(page)) - page->mapping = NULL; + /* notify page idle for dax */ + if (!is_device_private_page(page)) { + wake_up_var(&page->_refcount); + return; + } - page->pgmap->ops->page_free(page); - } else if (!count) - __put_page(page); + /* Clear Active bit in case of parallel mark_page_accessed */ + __ClearPageActive(page); + __ClearPageWaiters(page); + + mem_cgroup_uncharge(page); + + /* + * When a device_private page is freed, the page->mapping field + * may still contain a (stale) mapping value. For example, the + * lower bits of page->mapping may still identify the page as an + * anonymous page. Ultimately, this entire field is just stale + * and wrong, and it will cause errors if not cleared. One + * example is: + * + * migrate_vma_pages() + * migrate_vma_insert_page() + * page_add_new_anon_rmap() + * __page_set_anon_rmap() + * ...checks page->mapping, via PageAnon(page) call, + * and incorrectly concludes that the page is an + * anonymous page. Therefore, it incorrectly, + * silently fails to set up the new anon rmap. + * + * For other types of ZONE_DEVICE pages, migration is either + * handled differently or not done at all, so there is no need + * to clear page->mapping. + */ + page->mapping = NULL; + page->pgmap->ops->page_free(page); } EXPORT_SYMBOL(__put_devmap_managed_page); #endif /* CONFIG_DEV_PAGEMAP_OPS */ From patchwork Thu Nov 21 07:13:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255793 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2584D930 for ; Thu, 21 Nov 2019 07:18:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id ED24120898 for ; Thu, 21 Nov 2019 07:18:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="Llzls4pf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726927AbfKUHSP (ORCPT ); Thu, 21 Nov 2019 02:18:15 -0500 Received: from hqemgate15.nvidia.com ([216.228.121.64]:17706 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726379AbfKUHN6 (ORCPT ); Thu, 21 Nov 2019 02:13:58 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:52 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:56 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 20 Nov 2019 23:13:56 -0800 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:55 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:55 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:55 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard , Christoph Hellwig Subject: [PATCH v7 05/24] mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages Date: Wed, 20 Nov 2019 23:13:35 -0800 Message-ID: <20191121071354.456618-6-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320432; bh=byIqgGf1EJth9UpGxCEb3oipovUAmrOXJL84EZuQwTc=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Type:Content-Transfer-Encoding; b=Llzls4pfdbcqqm4qnT5GuHajFXGLCPIcrUE0DkRt8l4I25CrZikNfd4Slc5Fxj9ks 6gzzgHe7RbL9/EIRW/PXjDaekiNRUKMW+Ckc0cGN0pMQkVo703PFzL3arm6aJWWH/A VtfRcgYsvHYe66fnd5NwuyVH58A0bQd0SiXHJZMJ9wQ09DZrqXiYvUdh2Bhvvx34MK nDFGiosmgFzwqEurkc0Kx/3oAHEL18Jy2n5nSX/K8CVNSvCo78qh5h9uGIrr5WHNlZ m4EtlE0rzAiGMu2Edws3eUzOJ5hGMUJHtAXstiTcCu2yNe24rcOF+tKHiGLrPDj2sp +98qaNRV16i8w== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org An upcoming patch changes and complicates the refcounting and especially the "put page" aspects of it. In order to keep everything clean, refactor the devmap page release routines: * Rename put_devmap_managed_page() to page_is_devmap_managed(), and limit the functionality to "read only": return a bool, with no side effects. * Add a new routine, put_devmap_managed_page(), to handle checking what kind of page it is, and what kind of refcount handling it requires. * Rename __put_devmap_managed_page() to free_devmap_managed_page(), and limit the functionality to unconditionally freeing a devmap page. This is originally based on a separate patch by Ira Weiny, which applied to an early version of the put_user_page() experiments. Since then, Jérôme Glisse suggested the refactoring described above. Cc: Christoph Hellwig Suggested-by: Jérôme Glisse Reviewed-by: Dan Williams Reviewed-by: Jan Kara Signed-off-by: Ira Weiny Signed-off-by: John Hubbard --- include/linux/mm.h | 27 ++++++++++++++++++++++++--- mm/memremap.c | 16 ++-------------- 2 files changed, 26 insertions(+), 17 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index a2adf95b3f9c..96228376139c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -967,9 +967,10 @@ static inline bool is_zone_device_page(const struct page *page) #endif #ifdef CONFIG_DEV_PAGEMAP_OPS -void __put_devmap_managed_page(struct page *page); +void free_devmap_managed_page(struct page *page); DECLARE_STATIC_KEY_FALSE(devmap_managed_key); -static inline bool put_devmap_managed_page(struct page *page) + +static inline bool page_is_devmap_managed(struct page *page) { if (!static_branch_unlikely(&devmap_managed_key)) return false; @@ -978,7 +979,6 @@ static inline bool put_devmap_managed_page(struct page *page) switch (page->pgmap->type) { case MEMORY_DEVICE_PRIVATE: case MEMORY_DEVICE_FS_DAX: - __put_devmap_managed_page(page); return true; default: break; @@ -986,6 +986,27 @@ static inline bool put_devmap_managed_page(struct page *page) return false; } +static inline bool put_devmap_managed_page(struct page *page) +{ + bool is_devmap = page_is_devmap_managed(page); + + if (is_devmap) { + int count = page_ref_dec_return(page); + + /* + * devmap page refcounts are 1-based, rather than 0-based: if + * refcount is 1, then the page is free and the refcount is + * stable because nobody holds a reference on the page. + */ + if (count == 1) + free_devmap_managed_page(page); + else if (!count) + __put_page(page); + } + + return is_devmap; +} + #else /* CONFIG_DEV_PAGEMAP_OPS */ static inline bool put_devmap_managed_page(struct page *page) { diff --git a/mm/memremap.c b/mm/memremap.c index e899fa876a62..2ba773859031 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -411,20 +411,8 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, EXPORT_SYMBOL_GPL(get_dev_pagemap); #ifdef CONFIG_DEV_PAGEMAP_OPS -void __put_devmap_managed_page(struct page *page) +void free_devmap_managed_page(struct page *page) { - int count = page_ref_dec_return(page); - - /* still busy */ - if (count > 1) - return; - - /* only triggered by the dev_pagemap shutdown path */ - if (count == 0) { - __put_page(page); - return; - } - /* notify page idle for dax */ if (!is_device_private_page(page)) { wake_up_var(&page->_refcount); @@ -461,5 +449,5 @@ void __put_devmap_managed_page(struct page *page) page->mapping = NULL; page->pgmap->ops->page_free(page); } -EXPORT_SYMBOL(__put_devmap_managed_page); +EXPORT_SYMBOL(free_devmap_managed_page); #endif /* CONFIG_DEV_PAGEMAP_OPS */ From patchwork Thu Nov 21 07:13:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255729 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1C09614E5 for ; Thu, 21 Nov 2019 07:17:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id ED8BF2089F for ; Thu, 21 Nov 2019 07:17:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="S9Cs+WYr" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726689AbfKUHR1 (ORCPT ); Thu, 21 Nov 2019 02:17:27 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:4294 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727047AbfKUHOD (ORCPT ); Thu, 21 Nov 2019 02:14:03 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:57 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:56 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 20 Nov 2019 23:13:56 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:55 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:55 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:55 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH v7 06/24] goldish_pipe: rename local pin_user_pages() routine Date: Wed, 20 Nov 2019 23:13:36 -0800 Message-ID: <20191121071354.456618-7-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320437; bh=cxrYI8z67LUDSVSVMDkZ4g/06HxDdZi4HuPof8bMOLY=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Type:Content-Transfer-Encoding; b=S9Cs+WYr2e3d+dBh1MPD4feJLhbOZcyw8dghn9WNYs9/Hkq96eA4nlQX8SLiJuPMe fa8TC2OrwfA87SSw1m+iKsYukesmaJF4VxTYc2pfaFnKOhqKt61a/xbPvhWQpOkCAL VCg20LTmJKdw3RpjAzj9U5M9ELKajIS1WtcQoAcsseQD/JVd/55n1TqHXnDJu8yyFP OyiG+IBq0NwMIhllCjWEjMtEz/J1hNk7eJiWOLfoTsB/ykTwHvk2fDMG6WCABXkP3K Pq8v7bJY3uphFQGkPppWp6FME5EK8w0gYsgo3pHEegqhADC2I61MQ4XxOn6zph3rKk d5kdx13cB77jg== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org 1. Avoid naming conflicts: rename local static function from "pin_user_pages()" to "pin_goldfish_pages()". An upcoming patch will introduce a global pin_user_pages() function. Reviewed-by: Jan Kara Reviewed-by: Jérôme Glisse Reviewed-by: Ira Weiny Signed-off-by: John Hubbard --- drivers/platform/goldfish/goldfish_pipe.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/platform/goldfish/goldfish_pipe.c b/drivers/platform/goldfish/goldfish_pipe.c index cef0133aa47a..7ed2a21a0bac 100644 --- a/drivers/platform/goldfish/goldfish_pipe.c +++ b/drivers/platform/goldfish/goldfish_pipe.c @@ -257,12 +257,12 @@ static int goldfish_pipe_error_convert(int status) } } -static int pin_user_pages(unsigned long first_page, - unsigned long last_page, - unsigned int last_page_size, - int is_write, - struct page *pages[MAX_BUFFERS_PER_COMMAND], - unsigned int *iter_last_page_size) +static int pin_goldfish_pages(unsigned long first_page, + unsigned long last_page, + unsigned int last_page_size, + int is_write, + struct page *pages[MAX_BUFFERS_PER_COMMAND], + unsigned int *iter_last_page_size) { int ret; int requested_pages = ((last_page - first_page) >> PAGE_SHIFT) + 1; @@ -354,9 +354,9 @@ static int transfer_max_buffers(struct goldfish_pipe *pipe, if (mutex_lock_interruptible(&pipe->lock)) return -ERESTARTSYS; - pages_count = pin_user_pages(first_page, last_page, - last_page_size, is_write, - pipe->pages, &iter_last_page_size); + pages_count = pin_goldfish_pages(first_page, last_page, + last_page_size, is_write, + pipe->pages, &iter_last_page_size); if (pages_count < 0) { mutex_unlock(&pipe->lock); return pages_count; From patchwork Thu Nov 21 07:13:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255483 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7C7B8930 for ; Thu, 21 Nov 2019 07:14:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5D5A3208CE for ; Thu, 21 Nov 2019 07:14:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="XWyvFHkO" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727491AbfKUHOZ (ORCPT ); Thu, 21 Nov 2019 02:14:25 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:4493 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727442AbfKUHOO (ORCPT ); Thu, 21 Nov 2019 02:14:14 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:14:02 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:14:01 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 20 Nov 2019 23:14:01 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:14:00 +0000 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:55 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:55 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:55 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , "Mauro Carvalho Chehab" , Michael Ellerman , Michal Hocko , Mike Kravetz , "Paul Mackerras" , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard , "Jason Gunthorpe" Subject: [PATCH v7 07/24] IB/umem: use get_user_pages_fast() to pin DMA pages Date: Wed, 20 Nov 2019 23:13:37 -0800 Message-ID: <20191121071354.456618-8-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320442; bh=qyPjlUl206QVQFHI7iGgz2aRN3LekupswNOi4m6XHTs=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=XWyvFHkOVO1okwySDaJ3ggfbjP4DOuRf96LBPW5EPPVdUGRxB78SxXJsnJksS2DGz nvlh9Dj078w3JduJGegpHK5aInIU+jEPY5vZz6dQE6tl8KbCsXuDjDAHNh2HiCj7Vy 7UGZiM0N9XtyCm36gAVeqe6WnNAjMnb8Ogh9zu1AeSpJChkFJU+Ryzxyqpu2K/3itH 0UfkEcGHpGKeKme7S8gkyhtrjv6dtMlANoEXGhSnoM+jUByzI034bR6JkE+hh0y5HS XvRi2qeu2ccXeonWlADI+P58jx117FGKdSCECfKL9BPKKnKOk0PkFxvApX3cVwSlNO GKJ0UkFxj0rIA== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org And get rid of the mmap_sem calls, as part of that. Note that get_user_pages_fast() will, if necessary, fall back to __gup_longterm_unlocked(), which takes the mmap_sem as needed. Reviewed-by: Jan Kara Reviewed-by: Jason Gunthorpe Reviewed-by: Ira Weiny Signed-off-by: John Hubbard Reviewed-by: Christoph Hellwig Signed-off-by: Jason Gunthorpe --- drivers/infiniband/core/umem.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 24244a2f68cc..3d664a2539eb 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -271,16 +271,13 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, sg = umem->sg_head.sgl; while (npages) { - down_read(&mm->mmap_sem); - ret = get_user_pages(cur_base, - min_t(unsigned long, npages, - PAGE_SIZE / sizeof (struct page *)), - gup_flags | FOLL_LONGTERM, - page_list, NULL); - if (ret < 0) { - up_read(&mm->mmap_sem); + ret = get_user_pages_fast(cur_base, + min_t(unsigned long, npages, + PAGE_SIZE / + sizeof(struct page *)), + gup_flags | FOLL_LONGTERM, page_list); + if (ret < 0) goto umem_release; - } cur_base += ret * PAGE_SIZE; npages -= ret; @@ -288,8 +285,6 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, sg = ib_umem_add_sg_table(sg, page_list, ret, dma_get_max_seg_size(context->device->dma_device), &umem->sg_nents); - - up_read(&mm->mmap_sem); } sg_mark_end(sg); From patchwork Thu Nov 21 07:13:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255715 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 084E2930 for ; Thu, 21 Nov 2019 07:17:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DACF72089D for ; Thu, 21 Nov 2019 07:17:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="l02+ePxv" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727193AbfKUHOD (ORCPT ); Thu, 21 Nov 2019 02:14:03 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:4295 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727175AbfKUHOC (ORCPT ); Thu, 21 Nov 2019 02:14:02 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:57 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:56 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 20 Nov 2019 23:13:56 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:56 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:55 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:55 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard , Hans Verkuil Subject: [PATCH v7 08/24] media/v4l2-core: set pages dirty upon releasing DMA buffers Date: Wed, 20 Nov 2019 23:13:38 -0800 Message-ID: <20191121071354.456618-9-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320437; bh=baa1oHQC+1Mr0fCveGP8H7qjkYYEgQstEl5cvxrT4lI=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=l02+ePxvyNgAM8Jq9iVmvL89ZIGCWhL51aR8TLCdsj1IuxYjPgmcTTvEcClj+kdLG yX0iZsvpJz11Jp/ma+4TLdhqCGe/biUJQLm1TzQc3ga1SDkp92M0QjeXtcDHUAbq8p 4eholacjz/7av+JG9AuYr08DNgvP+x98oMMiFLEPqwWjRxpz60A+fyLBAz1k1hAtKc xNXlJ9sUI5XAZM7ZtwcMZynCfXg5lCwiGv13JlW9HmqOhR4T6/M1CAv6IQi84ZICFl YOQiPFSfOZoS5jpUKNBMOJhfULQdszsViMhBSbqB4n4970ZOSd4O9UVriQ7pFPk/yY oSuCMn2wZuTOQ== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org After DMA is complete, and the device and CPU caches are synchronized, it's still required to mark the CPU pages as dirty, if the data was coming from the device. However, this driver was just issuing a bare put_page() call, without any set_page_dirty*() call. Fix the problem, by calling set_page_dirty_lock() if the CPU pages were potentially receiving data from the device. Acked-by: Hans Verkuil Cc: Mauro Carvalho Chehab Signed-off-by: John Hubbard Reviewed-by: Christoph Hellwig --- drivers/media/v4l2-core/videobuf-dma-sg.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c index 66a6c6c236a7..28262190c3ab 100644 --- a/drivers/media/v4l2-core/videobuf-dma-sg.c +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c @@ -349,8 +349,11 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma) BUG_ON(dma->sglen); if (dma->pages) { - for (i = 0; i < dma->nr_pages; i++) + for (i = 0; i < dma->nr_pages; i++) { + if (dma->direction == DMA_FROM_DEVICE) + set_page_dirty_lock(dma->pages[i]); put_page(dma->pages[i]); + } kfree(dma->pages); dma->pages = NULL; } From patchwork Thu Nov 21 07:13:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255757 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F04E7930 for ; Thu, 21 Nov 2019 07:18:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C197F20898 for ; Thu, 21 Nov 2019 07:18:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="Y6IwovW0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727173AbfKUHRp (ORCPT ); Thu, 21 Nov 2019 02:17:45 -0500 Received: from hqemgate15.nvidia.com ([216.228.121.64]:17722 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726714AbfKUHN7 (ORCPT ); Thu, 21 Nov 2019 02:13:59 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:53 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:57 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 20 Nov 2019 23:13:57 -0800 Received: from HQMAIL107.nvidia.com (172.20.187.13) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:55 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:56 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:55 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , "Mauro Carvalho Chehab" , Michael Ellerman , Michal Hocko , Mike Kravetz , "Paul Mackerras" , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard , "Jason Gunthorpe" Subject: [PATCH v7 09/24] vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM Date: Wed, 20 Nov 2019 23:13:39 -0800 Message-ID: <20191121071354.456618-10-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320433; bh=Uwl18s1SSepFZDEDvXpYEZES6pW0/uOIoLMPiWyv/Qc=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=Y6IwovW0VYU+VHJ2ZkZZO4WhiYcLlNJDLX3At5D7gGMjXxPYWMnfr+PP9N1AAfxLO 98vur3h9JgtyRFONejciuxhxSQO1Pa14CyV7yQakfyXp0lFWlQzheM4YkVfWD+43IO j8hwojT+QlI1DFtKNbau0hn2YdoAVV3aerMAnO1THZmLR/I6OOm8dlbL0vb+gouocl lK12lrkUqW9V0l4c8dN7C2/K4n57zvr07C9hlu8zfG2pkhVgbNVW5GTuUpvcvAJWNJ 4OWP/Nh3bpVtRXyIyFU7kBLJemn2iYUCWJhCg/vAFVkujLxZKNvmpZ5kxUprVWzRPR E6VI3Y54AZ71g== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org As it says in the updated comment in gup.c: current FOLL_LONGTERM behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on vmas. However, the corresponding restriction in get_user_pages_remote() was slightly stricter than is actually required: it forbade all FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers that do not set the "locked" arg. Update the code and comments accordingly, and update the VFIO caller to take advantage of this, fixing a bug as a result: the VFIO caller is logically a FOLL_LONGTERM user. Also, remove an unnessary pair of calls that were releasing and reacquiring the mmap_sem. There is no need to avoid holding mmap_sem just in order to call page_to_pfn(). Also, move the DAX check ("if a VMA is DAX, don't allow long term pinning") from the VFIO call site, all the way into the internals of get_user_pages_remote() and __gup_longterm_locked(). That is: get_user_pages_remote() calls __gup_longterm_locked(), which in turn calls check_dax_vmas(). It's lightly explained in the comments as well. Thanks to Jason Gunthorpe for pointing out a clean way to fix this, and to Dan Williams for helping clarify the DAX refactoring. Reviewed-by: Jason Gunthorpe Reviewed-by: Ira Weiny Suggested-by: Jason Gunthorpe Cc: Dan Williams Cc: Jerome Glisse Signed-off-by: John Hubbard Tested-by: Alex Williamson Acked-by: Alex Williamson --- drivers/vfio/vfio_iommu_type1.c | 30 +++++------------------------- mm/gup.c | 27 ++++++++++++++++++++++----- 2 files changed, 27 insertions(+), 30 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index d864277ea16f..c7a111ad9975 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -340,7 +340,6 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, { struct page *page[1]; struct vm_area_struct *vma; - struct vm_area_struct *vmas[1]; unsigned int flags = 0; int ret; @@ -348,33 +347,14 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, flags |= FOLL_WRITE; down_read(&mm->mmap_sem); - if (mm == current->mm) { - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page, - vmas); - } else { - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, - vmas, NULL); - /* - * The lifetime of a vaddr_get_pfn() page pin is - * userspace-controlled. In the fs-dax case this could - * lead to indefinite stalls in filesystem operations. - * Disallow attempts to pin fs-dax pages via this - * interface. - */ - if (ret > 0 && vma_is_fsdax(vmas[0])) { - ret = -EOPNOTSUPP; - put_page(page[0]); - } - } - up_read(&mm->mmap_sem); - + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, + page, NULL, NULL); if (ret == 1) { *pfn = page_to_pfn(page[0]); - return 0; + ret = 0; + goto done; } - down_read(&mm->mmap_sem); - vaddr = untagged_addr(vaddr); vma = find_vma_intersection(mm, vaddr, vaddr + 1); @@ -384,7 +364,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, if (is_invalid_reserved_pfn(*pfn)) ret = 0; } - +done: up_read(&mm->mmap_sem); return ret; } diff --git a/mm/gup.c b/mm/gup.c index 14fcdc502166..cce2c9676853 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -29,6 +29,13 @@ struct follow_page_context { unsigned int page_mask; }; +static __always_inline long __gup_longterm_locked(struct task_struct *tsk, + struct mm_struct *mm, + unsigned long start, + unsigned long nr_pages, + struct page **pages, + struct vm_area_struct **vmas, + unsigned int flags); /* * Return the compound head page with ref appropriately incremented, * or NULL if that failed. @@ -1167,13 +1174,23 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, struct vm_area_struct **vmas, int *locked) { /* - * FIXME: Current FOLL_LONGTERM behavior is incompatible with + * Parts of FOLL_LONGTERM behavior are incompatible with * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on - * vmas. As there are no users of this flag in this call we simply - * disallow this option for now. + * vmas. However, this only comes up if locked is set, and there are + * callers that do request FOLL_LONGTERM, but do not set locked. So, + * allow what we can. */ - if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM)) - return -EINVAL; + if (gup_flags & FOLL_LONGTERM) { + if (WARN_ON_ONCE(locked)) + return -EINVAL; + /* + * This will check the vmas (even if our vmas arg is NULL) + * and return -ENOTSUPP if DAX isn't allowed in this case: + */ + return __gup_longterm_locked(tsk, mm, start, nr_pages, pages, + vmas, gup_flags | FOLL_TOUCH | + FOLL_REMOTE); + } return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, locked, From patchwork Thu Nov 21 07:13:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255795 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E8CA4112B for ; Thu, 21 Nov 2019 07:18:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id ADFB82089D for ; Thu, 21 Nov 2019 07:18:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="KCxzxrDt" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726685AbfKUHSP (ORCPT ); Thu, 21 Nov 2019 02:18:15 -0500 Received: from hqemgate14.nvidia.com ([216.228.121.143]:12998 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726522AbfKUHN7 (ORCPT ); Thu, 21 Nov 2019 02:13:59 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:59 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:56 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 20 Nov 2019 23:13:56 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:56 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:56 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:56 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard , Mike Rapoport Subject: [PATCH v7 10/24] mm/gup: introduce pin_user_pages*() and FOLL_PIN Date: Wed, 20 Nov 2019 23:13:40 -0800 Message-ID: <20191121071354.456618-11-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320439; bh=yx8CpiYYilraITR3Hn93NGBbrcLuivgsZTpI2LAtgi4=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Type:Content-Transfer-Encoding; b=KCxzxrDttnF+c9uqdrmUxuRQHqgwkdhUCWc3AtdGZM4ygUmd/egLrCFXQYZ5V//dZ 28sTNhJ47k7IhYbcP12prGnXIS7qhhijSUHmcR+6mtDW/0Ekz5NAI4PleBqNGu6ty+ W3f6yrvjhmjySvRtgIsH5AlLccG7l3xBlZMlmxwnEiFiWBH2e+WeVPa/C2vD/avISy 5m36S+AlVkfK2JdzBjZzoO4Szw/iw7WXRyrFaeRd6Oyau7Wp5uJoJwvM02jU5mAUM1 umfuvYIb5DVSYIHPS/nCr7zhrqqkSi5sTboBI1hmJkwJIGUaroMW+s3bNARthLdab9 5k2+nCWdkUJhQ== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Introduce pin_user_pages*() variations of get_user_pages*() calls, and also pin_longterm_pages*() variations. For now, these are placeholder calls, until the various call sites are converted to use the correct get_user_pages*() or pin_user_pages*() API. These variants will eventually all set FOLL_PIN, which is also introduced, and thoroughly documented. pin_user_pages() pin_user_pages_remote() pin_user_pages_fast() All pages that are pinned via the above calls, must be unpinned via put_user_page(). The underlying rules are: * FOLL_PIN is a gup-internal flag, so the call sites should not directly set it. That behavior is enforced with assertions. * Call sites that want to indicate that they are going to do DirectIO ("DIO") or something with similar characteristics, should call a get_user_pages()-like wrapper call that sets FOLL_PIN. These wrappers will: * Start with "pin_user_pages" instead of "get_user_pages". That makes it easy to find and audit the call sites. * Set FOLL_PIN * For pages that are received via FOLL_PIN, those pages must be returned via put_user_page(). Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases in this documentation. (I've reworded it and expanded upon it.) Reviewed-by: Jan Kara Reviewed-by: Mike Rapoport # Documentation Reviewed-by: Jérôme Glisse Cc: Jonathan Corbet Cc: Ira Weiny Signed-off-by: John Hubbard --- Documentation/core-api/index.rst | 1 + Documentation/core-api/pin_user_pages.rst | 233 ++++++++++++++++++++++ include/linux/mm.h | 63 ++++-- mm/gup.c | 153 ++++++++++++-- 4 files changed, 416 insertions(+), 34 deletions(-) create mode 100644 Documentation/core-api/pin_user_pages.rst diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index ab0eae1c153a..413f7d7c8642 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -31,6 +31,7 @@ Core utilities generic-radix-tree memory-allocation mm-api + pin_user_pages gfp_mask-from-fs-io timekeeping boot-time-mm diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst new file mode 100644 index 000000000000..4f26637a5005 --- /dev/null +++ b/Documentation/core-api/pin_user_pages.rst @@ -0,0 +1,233 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================================================== +pin_user_pages() and related calls +==================================================== + +.. contents:: :local: + +Overview +======== + +This document describes the following functions: :: + + pin_user_pages + pin_user_pages_fast + pin_user_pages_remote + +Basic description of FOLL_PIN +============================= + +FOLL_PIN and FOLL_LONGTERM are flags that can be passed to the get_user_pages*() +("gup") family of functions. FOLL_PIN has significant interactions and +interdependencies with FOLL_LONGTERM, so both are covered here. + +FOLL_PIN is internal to gup, meaning that it should not appear at the gup call +sites. This allows the associated wrapper functions (pin_user_pages*() and +others) to set the correct combination of these flags, and to check for problems +as well. + +FOLL_LONGTERM, on the other hand, *is* allowed to be set at the gup call sites. +This is in order to avoid creating a large number of wrapper functions to cover +all combinations of get*(), pin*(), FOLL_LONGTERM, and more. Also, the +pin_user_pages*() APIs are clearly distinct from the get_user_pages*() APIs, so +that's a natural dividing line, and a good point to make separate wrapper calls. +In other words, use pin_user_pages*() for DMA-pinned pages, and +get_user_pages*() for other cases. There are four cases described later on in +this document, to further clarify that concept. + +FOLL_PIN and FOLL_GET are mutually exclusive for a given gup call. However, +multiple threads and call sites are free to pin the same struct pages, via both +FOLL_PIN and FOLL_GET. It's just the call site that needs to choose one or the +other, not the struct page(s). + +The FOLL_PIN implementation is nearly the same as FOLL_GET, except that FOLL_PIN +uses a different reference counting technique. + +FOLL_PIN is a prerequisite to FOLL_LONGTGERM. Another way of saying that is, +FOLL_LONGTERM is a specific case, more restrictive case of FOLL_PIN. + +Which flags are set by each wrapper +=================================== + +For these pin_user_pages*() functions, FOLL_PIN is OR'd in with whatever gup +flags the caller provides. The caller is required to pass in a non-null struct +pages* array, and the function then pin pages by incrementing each by a special +value. For now, that value is +1, just like get_user_pages*().:: + + Function + -------- + pin_user_pages FOLL_PIN is always set internally by this function. + pin_user_pages_fast FOLL_PIN is always set internally by this function. + pin_user_pages_remote FOLL_PIN is always set internally by this function. + +For these get_user_pages*() functions, FOLL_GET might not even be specified. +Behavior is a little more complex than above. If FOLL_GET was *not* specified, +but the caller passed in a non-null struct pages* array, then the function +sets FOLL_GET for you, and proceeds to pin pages by incrementing the refcount +of each page by +1.:: + + Function + -------- + get_user_pages FOLL_GET is sometimes set internally by this function. + get_user_pages_fast FOLL_GET is sometimes set internally by this function. + get_user_pages_remote FOLL_GET is sometimes set internally by this function. + +Tracking dma-pinned pages +========================= + +Some of the key design constraints, and solutions, for tracking dma-pinned +pages: + +* An actual reference count, per struct page, is required. This is because + multiple processes may pin and unpin a page. + +* False positives (reporting that a page is dma-pinned, when in fact it is not) + are acceptable, but false negatives are not. + +* struct page may not be increased in size for this, and all fields are already + used. + +* Given the above, we can overload the page->_refcount field by using, sort of, + the upper bits in that field for a dma-pinned count. "Sort of", means that, + rather than dividing page->_refcount into bit fields, we simple add a medium- + large value (GUP_PIN_COUNTING_BIAS, initially chosen to be 1024: 10 bits) to + page->_refcount. This provides fuzzy behavior: if a page has get_page() called + on it 1024 times, then it will appear to have a single dma-pinned count. + And again, that's acceptable. + +This also leads to limitations: there are only 31-10==21 bits available for a +counter that increments 10 bits at a time. + +TODO: for 1GB and larger huge pages, this is cutting it close. That's because +when pin_user_pages() follows such pages, it increments the head page by "1" +(where "1" used to mean "+1" for get_user_pages(), but now means "+1024" for +pin_user_pages()) for each tail page. So if you have a 1GB huge page: + +* There are 256K (18 bits) worth of 4 KB tail pages. +* There are 21 bits available to count up via GUP_PIN_COUNTING_BIAS (that is, + 10 bits at a time) +* There are 21 - 18 == 3 bits available to count. Except that there aren't, + because you need to allow for a few normal get_page() calls on the head page, + as well. Fortunately, the approach of using addition, rather than "hard" + bitfields, within page->_refcount, allows for sharing these bits gracefully. + But we're still looking at about 8 references. + +This, however, is a missing feature more than anything else, because it's easily +solved by addressing an obvious inefficiency in the original get_user_pages() +approach of retrieving pages: stop treating all the pages as if they were +PAGE_SIZE. Retrieve huge pages as huge pages. The callers need to be aware of +this, so some work is required. Once that's in place, this limitation mostly +disappears from view, because there will be ample refcounting range available. + +* Callers must specifically request "dma-pinned tracking of pages". In other + words, just calling get_user_pages() will not suffice; a new set of functions, + pin_user_page() and related, must be used. + +FOLL_PIN, FOLL_GET, FOLL_LONGTERM: when to use which flags +========================================================== + +Thanks to Jan Kara, Vlastimil Babka and several other -mm people, for describing +these categories: + +CASE 1: Direct IO (DIO) +----------------------- +There are GUP references to pages that are serving +as DIO buffers. These buffers are needed for a relatively short time (so they +are not "long term"). No special synchronization with page_mkclean() or +munmap() is provided. Therefore, flags to set at the call site are: :: + + FOLL_PIN + +...but rather than setting FOLL_PIN directly, call sites should use one of +the pin_user_pages*() routines that set FOLL_PIN. + +CASE 2: RDMA +------------ +There are GUP references to pages that are serving as DMA +buffers. These buffers are needed for a long time ("long term"). No special +synchronization with page_mkclean() or munmap() is provided. Therefore, flags +to set at the call site are: :: + + FOLL_PIN | FOLL_LONGTERM + +NOTE: Some pages, such as DAX pages, cannot be pinned with longterm pins. That's +because DAX pages do not have a separate page cache, and so "pinning" implies +locking down file system blocks, which is not (yet) supported in that way. + +CASE 3: Hardware with page faulting support +------------------------------------------- +Here, a well-written driver doesn't normally need to pin pages at all. However, +if the driver does choose to do so, it can register MMU notifiers for the range, +and will be called back upon invalidation. Either way (avoiding page pinning, or +using MMU notifiers to unpin upon request), there is proper synchronization with +both filesystem and mm (page_mkclean(), munmap(), etc). + +Therefore, neither flag needs to be set. + +In this case, ideally, neither get_user_pages() nor pin_user_pages() should be +called. Instead, the software should be written so that it does not pin pages. +This allows mm and filesystems to operate more efficiently and reliably. + +CASE 4: Pinning for struct page manipulation only +------------------------------------------------- +Here, normal GUP calls are sufficient, so neither flag needs to be set. + +page_dma_pinned(): the whole point of pinning +============================================= + +The whole point of marking pages as "DMA-pinned" or "gup-pinned" is to be able +to query, "is this page DMA-pinned?" That allows code such as page_mkclean() +(and file system writeback code in general) to make informed decisions about +what to do when a page cannot be unmapped due to such pins. + +What to do in those cases is the subject of a years-long series of discussions +and debates (see the References at the end of this document). It's a TODO item +here: fill in the details once that's worked out. Meanwhile, it's safe to say +that having this available: :: + + static inline bool page_dma_pinned(struct page *page) + +...is a prerequisite to solving the long-running gup+DMA problem. + +Another way of thinking about FOLL_GET, FOLL_PIN, and FOLL_LONGTERM +=================================================================== + +Another way of thinking about these flags is as a progression of restrictions: +FOLL_GET is for struct page manipulation, without affecting the data that the +struct page refers to. FOLL_PIN is a *replacement* for FOLL_GET, and is for +short term pins on pages whose data *will* get accessed. As such, FOLL_PIN is +a "more severe" form of pinning. And finally, FOLL_LONGTERM is an even more +restrictive case that has FOLL_PIN as a prerequisite: this is for pages that +will be pinned longterm, and whose data will be accessed. + +Unit testing +============ +This file:: + + tools/testing/selftests/vm/gup_benchmark.c + +has the following new calls to exercise the new pin*() wrapper functions: + +* PIN_FAST_BENCHMARK (./gup_benchmark -a) +* PIN_LONGTERM_BENCHMARK (./gup_benchmark -a) +* PIN_BENCHMARK (./gup_benchmark -a) + +You can monitor how many total dma-pinned pages have been acquired and released +since the system was booted, via two new /proc/vmstat entries: :: + + /proc/vmstat/nr_foll_pin_requested + /proc/vmstat/nr_foll_pin_requested + +Those are both going to show zero, unless CONFIG_DEBUG_VM is set. This is +because there is a noticeable performance drop in put_user_page(), when they +are activated. + +References +========== + +* `Some slow progress on get_user_pages() (Apr 2, 2019) `_ +* `DMA and get_user_pages() (LPC: Dec 12, 2018) `_ +* `The trouble with get_user_pages() (Apr 30, 2018) `_ + +John Hubbard, October, 2019 diff --git a/include/linux/mm.h b/include/linux/mm.h index 96228376139c..568cbb895f03 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1075,16 +1075,14 @@ static inline void put_page(struct page *page) * put_user_page() - release a gup-pinned page * @page: pointer to page to be released * - * Pages that were pinned via get_user_pages*() must be released via - * either put_user_page(), or one of the put_user_pages*() routines - * below. This is so that eventually, pages that are pinned via - * get_user_pages*() can be separately tracked and uniquely handled. In - * particular, interactions with RDMA and filesystems need special - * handling. + * Pages that were pinned via pin_user_pages*() must be released via either + * put_user_page(), or one of the put_user_pages*() routines. This is so that + * eventually such pages can be separately tracked and uniquely handled. In + * particular, interactions with RDMA and filesystems need special handling. * * put_user_page() and put_page() are not interchangeable, despite this early * implementation that makes them look the same. put_user_page() calls must - * be perfectly matched up with get_user_page() calls. + * be perfectly matched up with pin*() calls. */ static inline void put_user_page(struct page *page) { @@ -1542,9 +1540,16 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas, int *locked); +long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas, int *locked); long get_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas); +long pin_user_pages(unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas); long get_user_pages_locked(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, int *locked); long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, @@ -1552,6 +1557,8 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, int get_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); +int pin_user_pages_fast(unsigned long start, int nr_pages, + unsigned int gup_flags, struct page **pages); int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, @@ -2610,13 +2617,15 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_ANON 0x8000 /* don't do file mappings */ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ +#define FOLL_PIN 0x40000 /* pages must be released via put_user_page() */ /* - * NOTE on FOLL_LONGTERM: + * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each + * other. Here is what they mean, and how to use them: * * FOLL_LONGTERM indicates that the page will be held for an indefinite time - * period _often_ under userspace control. This is contrasted with - * iov_iter_get_pages() where usages which are transient. + * period _often_ under userspace control. This is in contrast to + * iov_iter_get_pages(), whose usages are transient. * * FIXME: For pages which are part of a filesystem, mappings are subject to the * lifetime enforced by the filesystem and we need guarantees that longterm @@ -2631,11 +2640,39 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, * Currently only get_user_pages() and get_user_pages_fast() support this flag * and calls to get_user_pages_[un]locked are specifically not allowed. This * is due to an incompatibility with the FS DAX check and - * FAULT_FLAG_ALLOW_RETRY + * FAULT_FLAG_ALLOW_RETRY. * - * In the CMA case: longterm pins in a CMA region would unnecessarily fragment - * that region. And so CMA attempts to migrate the page before pinning when + * In the CMA case: long term pins in a CMA region would unnecessarily fragment + * that region. And so, CMA attempts to migrate the page before pinning, when * FOLL_LONGTERM is specified. + * + * FOLL_PIN indicates that a special kind of tracking (not just page->_refcount, + * but an additional pin counting system) will be invoked. This is intended for + * anything that gets a page reference and then touches page data (for example, + * Direct IO). This lets the filesystem know that some non-file-system entity is + * potentially changing the pages' data. In contrast to FOLL_GET (whose pages + * are released via put_page()), FOLL_PIN pages must be released, ultimately, by + * a call to put_user_page(). + * + * FOLL_PIN is similar to FOLL_GET: both of these pin pages. They use different + * and separate refcounting mechanisms, however, and that means that each has + * its own acquire and release mechanisms: + * + * FOLL_GET: get_user_pages*() to acquire, and put_page() to release. + * + * FOLL_PIN: pin_user_pages*() to acquire, and put_user_pages to release. + * + * FOLL_PIN and FOLL_GET are mutually exclusive for a given function call. + * (The underlying pages may experience both FOLL_GET-based and FOLL_PIN-based + * calls applied to them, and that's perfectly OK. This is a constraint on the + * callers, not on the pages.) + * + * FOLL_PIN should be set internally by the pin_user_pages*() APIs, never + * directly by the caller. That's in order to help avoid mismatches when + * releasing pages: get_user_pages*() pages must be released via put_page(), + * while pin_user_pages*() pages must be released via put_user_page(). + * + * Please see Documentation/vm/pin_user_pages.rst for more information. */ static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags) diff --git a/mm/gup.c b/mm/gup.c index cce2c9676853..f72d7a1635b4 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -201,6 +201,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, spinlock_t *ptl; pte_t *ptep, pte; + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == + (FOLL_PIN | FOLL_GET))) + return ERR_PTR(-EINVAL); retry: if (unlikely(pmd_bad(*pmd))) return no_page_table(vma, flags); @@ -812,7 +816,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, start = untagged_addr(start); - VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET)); + VM_BUG_ON(!!pages != !!(gup_flags & (FOLL_GET | FOLL_PIN))); /* * If FOLL_FORCE is set then do not force a full fault as the hinting @@ -1036,7 +1040,16 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, BUG_ON(*locked != 1); } - if (pages) + /* + * FOLL_PIN and FOLL_GET are mutually exclusive. Traditional behavior + * is to set FOLL_GET if the caller wants pages[] filled in (but has + * carelessly failed to specify FOLL_GET), so keep doing that, but only + * for FOLL_GET, not for the newer FOLL_PIN. + * + * FOLL_PIN always expects pages to be non-null, but no need to assert + * that here, as any failures will be obvious enough. + */ + if (pages && !(flags & FOLL_PIN)) flags |= FOLL_GET; pages_done = 0; @@ -1173,6 +1186,13 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas, int *locked) { + /* + * FOLL_PIN must only be set internally by the pin_user_pages*() APIs, + * never directly by the caller, so enforce that with an assertion: + */ + if (WARN_ON_ONCE(gup_flags & FOLL_PIN)) + return -EINVAL; + /* * Parts of FOLL_LONGTERM behavior are incompatible with * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on @@ -1640,6 +1660,13 @@ long get_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas) { + /* + * FOLL_PIN must only be set internally by the pin_user_pages*() APIs, + * never directly by the caller, so enforce that with an assertion: + */ + if (WARN_ON_ONCE(gup_flags & FOLL_PIN)) + return -EINVAL; + return __gup_longterm_locked(current, current->mm, start, nr_pages, pages, vmas, gup_flags | FOLL_TOUCH); } @@ -2386,29 +2413,14 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages, return ret; } -/** - * get_user_pages_fast() - pin user pages in memory - * @start: starting user address - * @nr_pages: number of pages from start to pin - * @gup_flags: flags modifying pin behaviour - * @pages: array that receives pointers to the pages pinned. - * Should be at least nr_pages long. - * - * Attempt to pin user pages in memory without taking mm->mmap_sem. - * If not successful, it will fall back to taking the lock and - * calling get_user_pages(). - * - * Returns number of pages pinned. This may be fewer than the number - * requested. If nr_pages is 0 or negative, returns 0. If no pages - * were pinned, returns -errno. - */ -int get_user_pages_fast(unsigned long start, int nr_pages, - unsigned int gup_flags, struct page **pages) +static int internal_get_user_pages_fast(unsigned long start, int nr_pages, + unsigned int gup_flags, + struct page **pages) { unsigned long addr, len, end; int nr = 0, ret = 0; - if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM))) + if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | FOLL_PIN))) return -EINVAL; start = untagged_addr(start) & PAGE_MASK; @@ -2448,4 +2460,103 @@ int get_user_pages_fast(unsigned long start, int nr_pages, return ret; } + +/** + * get_user_pages_fast() - pin user pages in memory + * @start: starting user address + * @nr_pages: number of pages from start to pin + * @gup_flags: flags modifying pin behaviour + * @pages: array that receives pointers to the pages pinned. + * Should be at least nr_pages long. + * + * Attempt to pin user pages in memory without taking mm->mmap_sem. + * If not successful, it will fall back to taking the lock and + * calling get_user_pages(). + * + * Returns number of pages pinned. This may be fewer than the number requested. + * If nr_pages is 0 or negative, returns 0. If no pages were pinned, returns + * -errno. + */ +int get_user_pages_fast(unsigned long start, int nr_pages, + unsigned int gup_flags, struct page **pages) +{ + /* + * FOLL_PIN must only be set internally by the pin_user_pages*() APIs, + * never directly by the caller, so enforce that: + */ + if (WARN_ON_ONCE(gup_flags & FOLL_PIN)) + return -EINVAL; + + return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages); +} EXPORT_SYMBOL_GPL(get_user_pages_fast); + +/** + * pin_user_pages_fast() - pin user pages in memory without taking locks + * + * For now, this is a placeholder function, until various call sites are + * converted to use the correct get_user_pages*() or pin_user_pages*() API. So, + * this is identical to get_user_pages_fast(). + * + * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It + * is NOT intended for Case 2 (RDMA: long-term pins). + */ +int pin_user_pages_fast(unsigned long start, int nr_pages, + unsigned int gup_flags, struct page **pages) +{ + /* + * This is a placeholder, until the pin functionality is activated. + * Until then, just behave like the corresponding get_user_pages*() + * routine. + */ + return get_user_pages_fast(start, nr_pages, gup_flags, pages); +} +EXPORT_SYMBOL_GPL(pin_user_pages_fast); + +/** + * pin_user_pages_remote() - pin pages of a remote process (task != current) + * + * For now, this is a placeholder function, until various call sites are + * converted to use the correct get_user_pages*() or pin_user_pages*() API. So, + * this is identical to get_user_pages_remote(). + * + * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It + * is NOT intended for Case 2 (RDMA: long-term pins). + */ +long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas, int *locked) +{ + /* + * This is a placeholder, until the pin functionality is activated. + * Until then, just behave like the corresponding get_user_pages*() + * routine. + */ + return get_user_pages_remote(tsk, mm, start, nr_pages, gup_flags, pages, + vmas, locked); +} +EXPORT_SYMBOL(pin_user_pages_remote); + +/** + * pin_user_pages() - pin user pages in memory for use by other devices + * + * For now, this is a placeholder function, until various call sites are + * converted to use the correct get_user_pages*() or pin_user_pages*() API. So, + * this is identical to get_user_pages(). + * + * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It + * is NOT intended for Case 2 (RDMA: long-term pins). + */ +long pin_user_pages(unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas) +{ + /* + * This is a placeholder, until the pin functionality is activated. + * Until then, just behave like the corresponding get_user_pages*() + * routine. + */ + return get_user_pages(start, nr_pages, gup_flags, pages, vmas); +} +EXPORT_SYMBOL(pin_user_pages); From patchwork Thu Nov 21 07:13:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255547 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 78158112B for ; Thu, 21 Nov 2019 07:15:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 57859208A1 for ; Thu, 21 Nov 2019 07:15:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="FJsKoBqE" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727378AbfKUHOM (ORCPT ); Thu, 21 Nov 2019 02:14:12 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:4441 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726739AbfKUHOK (ORCPT ); Thu, 21 Nov 2019 02:14:10 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:58 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:56 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 20 Nov 2019 23:13:56 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:56 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:56 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:56 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH v7 11/24] goldish_pipe: convert to pin_user_pages() and put_user_page() Date: Wed, 20 Nov 2019 23:13:41 -0800 Message-ID: <20191121071354.456618-12-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320438; bh=D0FnBuVDsqs8jHEGK9oTRdYqi2i3EI/a7a1G1r/Hsb0=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=FJsKoBqEkW3ESs59qC1IqSfgrVSTVrnh/YYpI98s+c+kigxMHkG12iEPhX01yxVhj XB03IvPOrjL6i3g1JS44UUIpbGXc4ODUk5+brIT53WVBA9zzGusY5JaNLf/0Y7+7H4 2a12uuEvBQtWLdf4Q9mvxXizm8ftNGoC60wm8rm8DirjpavGNso1/fBZgHniOnIGWw ZwOWppDQpqsu0W7HJbASnpF2K+OZJfo7YwNd+m04B3+tLJj30el8YGxW+fCKQHfwt1 p08s0SJVsxjzaBZ2am/2SDMafBOxZkawwDFVsMShHYVMp1/fvnDO7znpIlDg+e4Zsz SMqeyfH6NtrjQ== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org 1. Call the new global pin_user_pages_fast(), from pin_goldfish_pages(). 2. As required by pin_user_pages(), release these pages via put_user_page(). In this case, do so via put_user_pages_dirty_lock(). That has the side effect of calling set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate. As Christoph Hellwig put it, "set_page_dirty() is only safe if we are dealing with a file backed page where we have reference on the inode it hangs off." [1] Another side effect is that the release code is simplified because the page[] loop is now in gup.c instead of here, so just delete the local release_user_pages() entirely, and call put_user_pages_dirty_lock() directly, instead. [1] https://lore.kernel.org/r/20190723153640.GB720@lst.de Reviewed-by: Jan Kara Reviewed-by: Ira Weiny Signed-off-by: John Hubbard --- drivers/platform/goldfish/goldfish_pipe.c | 17 +++-------------- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/drivers/platform/goldfish/goldfish_pipe.c b/drivers/platform/goldfish/goldfish_pipe.c index 7ed2a21a0bac..635a8bc1b480 100644 --- a/drivers/platform/goldfish/goldfish_pipe.c +++ b/drivers/platform/goldfish/goldfish_pipe.c @@ -274,7 +274,7 @@ static int pin_goldfish_pages(unsigned long first_page, *iter_last_page_size = last_page_size; } - ret = get_user_pages_fast(first_page, requested_pages, + ret = pin_user_pages_fast(first_page, requested_pages, !is_write ? FOLL_WRITE : 0, pages); if (ret <= 0) @@ -285,18 +285,6 @@ static int pin_goldfish_pages(unsigned long first_page, return ret; } -static void release_user_pages(struct page **pages, int pages_count, - int is_write, s32 consumed_size) -{ - int i; - - for (i = 0; i < pages_count; i++) { - if (!is_write && consumed_size > 0) - set_page_dirty(pages[i]); - put_page(pages[i]); - } -} - /* Populate the call parameters, merging adjacent pages together */ static void populate_rw_params(struct page **pages, int pages_count, @@ -372,7 +360,8 @@ static int transfer_max_buffers(struct goldfish_pipe *pipe, *consumed_size = pipe->command_buffer->rw_params.consumed_size; - release_user_pages(pipe->pages, pages_count, is_write, *consumed_size); + put_user_pages_dirty_lock(pipe->pages, pages_count, + !is_write && *consumed_size > 0); mutex_unlock(&pipe->lock); return 0; From patchwork Thu Nov 21 07:13:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255511 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CD611186D for ; Thu, 21 Nov 2019 07:14:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A5312208CE for ; Thu, 21 Nov 2019 07:14:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="abj4hJXY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726948AbfKUHOp (ORCPT ); Thu, 21 Nov 2019 02:14:45 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:4488 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727419AbfKUHON (ORCPT ); Thu, 21 Nov 2019 02:14:13 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:14:00 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:59 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 20 Nov 2019 23:13:59 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:56 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:56 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:56 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , "Mauro Carvalho Chehab" , Michael Ellerman , Michal Hocko , Mike Kravetz , "Paul Mackerras" , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard , "Jason Gunthorpe" Subject: [PATCH v7 12/24] IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP Date: Wed, 20 Nov 2019 23:13:42 -0800 Message-ID: <20191121071354.456618-13-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320440; bh=myfFUrXMB6aLWHiNM/hWcFnVqvce6UcHk9d4RQDaMWY=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=abj4hJXYMm947y/aprfGNual9fWZxokobjicqmcEi+Vv0kmZW5Jq05HomeGJQAZ6I rkBaGldYkD+Y6h72ZzN2NFRo8vmE3GovmjmAOQcDYYN4N9tappF0PHdWWWDEhD6mGa DhDbX6Xil8MvCanRYNjfUCoy75gW11sTTTWmJM7rVpqdpLrTpN/bJElBXe5AAckp7F r98cUznEMgvhQrW7Fh4UsOUMlCAa1XME2XryCdMQ84pKgailwCEY2VoR5F6YfhrYBl +jXrSaKB4SbcCS5j3RfjjqiUH8AdyK9DgLSi1JTSxrT77uk7iP3LPN5Mwr4nS4WoSy gyj1i5tSS4N4w== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Convert infiniband to use the new pin_user_pages*() calls. Also, revert earlier changes to Infiniband ODP that had it using put_user_page(). ODP is "Case 3" in Documentation/core-api/pin_user_pages.rst, which is to say, normal get_user_pages() and put_page() is the API to use there. The new pin_user_pages*() calls replace corresponding get_user_pages*() calls, and set the FOLL_PIN flag. The FOLL_PIN flag requires that the caller must return the pages via put_user_page*() calls, but infiniband was already doing that as part of an earlier commit. Reviewed-by: Jason Gunthorpe Signed-off-by: John Hubbard --- drivers/infiniband/core/umem.c | 2 +- drivers/infiniband/core/umem_odp.c | 13 ++++++------- drivers/infiniband/hw/hfi1/user_pages.c | 2 +- drivers/infiniband/hw/mthca/mthca_memfree.c | 2 +- drivers/infiniband/hw/qib/qib_user_pages.c | 2 +- drivers/infiniband/hw/qib/qib_user_sdma.c | 2 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 2 +- drivers/infiniband/sw/siw/siw_mem.c | 2 +- 8 files changed, 13 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 3d664a2539eb..2c287ced3439 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -271,7 +271,7 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, sg = umem->sg_head.sgl; while (npages) { - ret = get_user_pages_fast(cur_base, + ret = pin_user_pages_fast(cur_base, min_t(unsigned long, npages, PAGE_SIZE / sizeof(struct page *)), diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index 163ff7ba92b7..11249406148a 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -495,9 +495,8 @@ EXPORT_SYMBOL(ib_umem_odp_release); * The function returns -EFAULT if the DMA mapping operation fails. It returns * -EAGAIN if a concurrent invalidation prevents us from updating the page. * - * The page is released via put_user_page even if the operation failed. For - * on-demand pinning, the page is released whenever it isn't stored in the - * umem. + * The page is released via put_page even if the operation failed. For on-demand + * pinning, the page is released whenever it isn't stored in the umem. */ static int ib_umem_odp_map_dma_single_page( struct ib_umem_odp *umem_odp, @@ -542,7 +541,7 @@ static int ib_umem_odp_map_dma_single_page( } out: - put_user_page(page); + put_page(page); if (remove_existing_mapping) { ib_umem_notifier_start_account(umem_odp); @@ -665,7 +664,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, ret = -EFAULT; break; } - put_user_page(local_page_list[j]); + put_page(local_page_list[j]); continue; } @@ -692,8 +691,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, * ib_umem_odp_map_dma_single_page(). */ if (npages - (j + 1) > 0) - put_user_pages(&local_page_list[j+1], - npages - (j + 1)); + release_pages(&local_page_list[j+1], + npages - (j + 1)); break; } } diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c index 469acb961fbd..9a94761765c0 100644 --- a/drivers/infiniband/hw/hfi1/user_pages.c +++ b/drivers/infiniband/hw/hfi1/user_pages.c @@ -106,7 +106,7 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np int ret; unsigned int gup_flags = FOLL_LONGTERM | (writable ? FOLL_WRITE : 0); - ret = get_user_pages_fast(vaddr, npages, gup_flags, pages); + ret = pin_user_pages_fast(vaddr, npages, gup_flags, pages); if (ret < 0) return ret; diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c index edccfd6e178f..8269ab040c21 100644 --- a/drivers/infiniband/hw/mthca/mthca_memfree.c +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c @@ -472,7 +472,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar, goto out; } - ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, + ret = pin_user_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE | FOLL_LONGTERM, pages); if (ret < 0) goto out; diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c index 6bf764e41891..7fc4b5f81fcd 100644 --- a/drivers/infiniband/hw/qib/qib_user_pages.c +++ b/drivers/infiniband/hw/qib/qib_user_pages.c @@ -108,7 +108,7 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages, down_read(¤t->mm->mmap_sem); for (got = 0; got < num_pages; got += ret) { - ret = get_user_pages(start_page + got * PAGE_SIZE, + ret = pin_user_pages(start_page + got * PAGE_SIZE, num_pages - got, FOLL_LONGTERM | FOLL_WRITE | FOLL_FORCE, p + got, NULL); diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c b/drivers/infiniband/hw/qib/qib_user_sdma.c index 05190edc2611..1a3cc2957e3a 100644 --- a/drivers/infiniband/hw/qib/qib_user_sdma.c +++ b/drivers/infiniband/hw/qib/qib_user_sdma.c @@ -670,7 +670,7 @@ static int qib_user_sdma_pin_pages(const struct qib_devdata *dd, else j = npages; - ret = get_user_pages_fast(addr, j, FOLL_LONGTERM, pages); + ret = pin_user_pages_fast(addr, j, FOLL_LONGTERM, pages); if (ret != j) { i = 0; j = ret; diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index 62e6ffa9ad78..600896727d34 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -141,7 +141,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable, ret = 0; while (npages) { - ret = get_user_pages(cur_base, + ret = pin_user_pages(cur_base, min_t(unsigned long, npages, PAGE_SIZE / sizeof(struct page *)), gup_flags | FOLL_LONGTERM, diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c index e99983f07663..e53b07dcfed5 100644 --- a/drivers/infiniband/sw/siw/siw_mem.c +++ b/drivers/infiniband/sw/siw/siw_mem.c @@ -426,7 +426,7 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable) while (nents) { struct page **plist = &umem->page_chunk[i].plist[got]; - rv = get_user_pages(first_page_va, nents, + rv = pin_user_pages(first_page_va, nents, foll_flags | FOLL_LONGTERM, plist, NULL); if (rv < 0) From patchwork Thu Nov 21 07:13:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255669 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9D4C4930 for ; Thu, 21 Nov 2019 07:16:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7CDBE2089D for ; Thu, 21 Nov 2019 07:16:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="JSMhZVKV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727237AbfKUHOE (ORCPT ); Thu, 21 Nov 2019 02:14:04 -0500 Received: from hqemgate15.nvidia.com ([216.228.121.64]:17805 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727182AbfKUHOE (ORCPT ); Thu, 21 Nov 2019 02:14:04 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:53 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:57 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 20 Nov 2019 23:13:57 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:56 +0000 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:56 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:56 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:56 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , "Mauro Carvalho Chehab" , Michael Ellerman , Michal Hocko , Mike Kravetz , "Paul Mackerras" , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH v7 13/24] mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote() Date: Wed, 20 Nov 2019 23:13:43 -0800 Message-ID: <20191121071354.456618-14-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320433; bh=P8nVR+kV72iD6sM0eEn6ntwngrRbmURGxrL6qNBlbv8=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Type:Content-Transfer-Encoding; b=JSMhZVKVmwMp6891HYJDTizxkwhjOngBAO/r/ip342ge19NHyGHFKmIquKO1pkPd1 Ok5q0RbZf9kyuKHiQaR4FyppFVaSCzyG8MnpGYVriFsn1bt7L57mCOCu+OaCUzlHAx 09P9Z0vW3HIMA6i3QM3Ir5rZSMB27QOUu3bu+RMmmrw5mL7rLJJwJcbgVm66POBWmo pKQC8pY3Kn2a4ab8msQFk5zzRj5tcDt1Ph/UD+oF3fh7CdCnz2yP4eEXY6mt3mxZ7t +3NLj0cP8y7t7TPZ3idcdIEsMMIurFLGvJE68kiLPr6bIiShdGbyaFhNL7lDjVtLbG 0VjlenGwE76gA== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Convert process_vm_access to use the new pin_user_pages_remote() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages. Also, release the pages via put_user_page*(). Also, rename "pages" to "pinned_pages", as this makes for easier reading of process_vm_rw_single_vec(). Reviewed-by: Jan Kara Reviewed-by: Jérôme Glisse Reviewed-by: Ira Weiny Signed-off-by: John Hubbard --- mm/process_vm_access.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c index 357aa7bef6c0..fd20ab675b85 100644 --- a/mm/process_vm_access.c +++ b/mm/process_vm_access.c @@ -42,12 +42,11 @@ static int process_vm_rw_pages(struct page **pages, if (copy > len) copy = len; - if (vm_write) { + if (vm_write) copied = copy_page_from_iter(page, offset, copy, iter); - set_page_dirty_lock(page); - } else { + else copied = copy_page_to_iter(page, offset, copy, iter); - } + len -= copied; if (copied < copy && iov_iter_count(iter)) return -EFAULT; @@ -96,7 +95,7 @@ static int process_vm_rw_single_vec(unsigned long addr, flags |= FOLL_WRITE; while (!rc && nr_pages && iov_iter_count(iter)) { - int pages = min(nr_pages, max_pages_per_loop); + int pinned_pages = min(nr_pages, max_pages_per_loop); int locked = 1; size_t bytes; @@ -106,14 +105,15 @@ static int process_vm_rw_single_vec(unsigned long addr, * current/current->mm */ down_read(&mm->mmap_sem); - pages = get_user_pages_remote(task, mm, pa, pages, flags, - process_pages, NULL, &locked); + pinned_pages = pin_user_pages_remote(task, mm, pa, pinned_pages, + flags, process_pages, + NULL, &locked); if (locked) up_read(&mm->mmap_sem); - if (pages <= 0) + if (pinned_pages <= 0) return -EFAULT; - bytes = pages * PAGE_SIZE - start_offset; + bytes = pinned_pages * PAGE_SIZE - start_offset; if (bytes > len) bytes = len; @@ -122,10 +122,12 @@ static int process_vm_rw_single_vec(unsigned long addr, vm_write); len -= bytes; start_offset = 0; - nr_pages -= pages; - pa += pages * PAGE_SIZE; - while (pages) - put_page(process_pages[--pages]); + nr_pages -= pinned_pages; + pa += pinned_pages * PAGE_SIZE; + + /* If vm_write is set, the pages need to be made dirty: */ + put_user_pages_dirty_lock(process_pages, pinned_pages, + vm_write); } return rc; From patchwork Thu Nov 21 07:13:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255561 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AAF6C112B for ; Thu, 21 Nov 2019 07:15:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8A899208A1 for ; Thu, 21 Nov 2019 07:15:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="b/W5NpF8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726735AbfKUHOL (ORCPT ); Thu, 21 Nov 2019 02:14:11 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:4442 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727354AbfKUHOK (ORCPT ); Thu, 21 Nov 2019 02:14:10 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:58 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:57 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 20 Nov 2019 23:13:57 -0800 Received: from HQMAIL107.nvidia.com (172.20.187.13) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:56 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:56 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:56 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard , Daniel Vetter Subject: [PATCH v7 14/24] drm/via: set FOLL_PIN via pin_user_pages_fast() Date: Wed, 20 Nov 2019 23:13:44 -0800 Message-ID: <20191121071354.456618-15-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320438; bh=ky7gpstoQdJr5rtxFEQnjpS0lvKe5L6cg7jiu/68ipE=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Type:Content-Transfer-Encoding; b=b/W5NpF8GDUEmpJ8O36zDZzz5riJMX7tgi2g9xl8ZnK03CSjgBYqaPDLvsSWlW/ui aDMzpxnbtQ9twvU1dadKWqbpvmzPAwZQ1yWQlYdqtNd7MxMJWNY7A34M5O0Qf6e/Ph AYFTGMGUpTL/VPDLBMCm2znB7fQZt9HrJOI5dDG6BE1HG0HYFb68e8mqXNhJY2CJaO /AODTG+M4Zhkv9nKjLEqa5Ws8hiJjWqilHAp/AIvgyWxNGAwXstBpbLNTp5RCEK/a+ vvP3UCk8F+hB1t7hCVz/W7jdeclauOC3JA2UB0tqYufCeuwKrH8Dkm6NPX0lbIwLR+ yAFvQJFUkJ0xA== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Convert drm/via to use the new pin_user_pages_fast() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages, and therefore for any code that calls put_user_page(). In partial anticipation of this work, the drm/via driver was already calling put_user_page() instead of put_page(). Therefore, in order to convert from the get_user_pages()/put_page() model, to the pin_user_pages()/put_user_page() model, the only change required is to change get_user_pages() to pin_user_pages(). Acked-by: Daniel Vetter Reviewed-by: Jérôme Glisse Reviewed-by: Ira Weiny Signed-off-by: John Hubbard --- drivers/gpu/drm/via/via_dmablit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/via/via_dmablit.c b/drivers/gpu/drm/via/via_dmablit.c index 3db000aacd26..37c5e572993a 100644 --- a/drivers/gpu/drm/via/via_dmablit.c +++ b/drivers/gpu/drm/via/via_dmablit.c @@ -239,7 +239,7 @@ via_lock_all_dma_pages(drm_via_sg_info_t *vsg, drm_via_dmablit_t *xfer) vsg->pages = vzalloc(array_size(sizeof(struct page *), vsg->num_pages)); if (NULL == vsg->pages) return -ENOMEM; - ret = get_user_pages_fast((unsigned long)xfer->mem_addr, + ret = pin_user_pages_fast((unsigned long)xfer->mem_addr, vsg->num_pages, vsg->direction == DMA_FROM_DEVICE ? FOLL_WRITE : 0, vsg->pages); From patchwork Thu Nov 21 07:13:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255645 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D29DD112B for ; Thu, 21 Nov 2019 07:16:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B06E420898 for ; Thu, 21 Nov 2019 07:16:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="ae5eh6US" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727307AbfKUHOG (ORCPT ); Thu, 21 Nov 2019 02:14:06 -0500 Received: from hqemgate14.nvidia.com ([216.228.121.143]:13095 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727175AbfKUHOF (ORCPT ); Thu, 21 Nov 2019 02:14:05 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:14:03 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:14:00 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 20 Nov 2019 23:14:00 -0800 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:56 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:56 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:56 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH v7 15/24] fs/io_uring: set FOLL_PIN via pin_user_pages() Date: Wed, 20 Nov 2019 23:13:45 -0800 Message-ID: <20191121071354.456618-16-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320443; bh=/H9b+aFChAqx04E818JcymcP2r9Q91gVxUQ0f1y3FiI=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=ae5eh6USUeLSTBA3O+2eYU5oJ3XYCWbHe/Atk9InBjJQPAiJ2wQqIfoYw9wA8OMtn h3do92KKTz0rl9Tepf6TWe96vAmLS338kH0quLHfmcbMtuCZuR2iXw8vZZvi1SPmjb HltRLeLOAwa3jsD5QpzamQ67sDkFEXNsUDkn0lXposvPhEgLQuwo0s4fotTM5sLPRg c60VUD+66DCi96WxpG6Z4X/QHJmppYMBgWg8Ghav1bAvtyAlpo5RFlealsnzwOYydQ sHRYpoPL2ANz4vgyhxnqLGgOJ+X6yMO33VkURFctks1LV+tbHvRAqHkZR/Q/J6vgZT 0tUDHRblYyXuw== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Convert fs/io_uring to use the new pin_user_pages() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages, and therefore for any code that calls put_user_page(). In partial anticipation of this work, the io_uring code was already calling put_user_page() instead of put_page(). Therefore, in order to convert from the get_user_pages()/put_page() model, to the pin_user_pages()/put_user_page() model, the only change required here is to change get_user_pages() to pin_user_pages(). Reviewed-by: Jens Axboe Reviewed-by: Jan Kara Signed-off-by: John Hubbard --- fs/io_uring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 2c819c3c855d..15715eeebaec 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -3449,7 +3449,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg, ret = 0; down_read(¤t->mm->mmap_sem); - pret = get_user_pages(ubuf, nr_pages, + pret = pin_user_pages(ubuf, nr_pages, FOLL_WRITE | FOLL_LONGTERM, pages, vmas); if (pret == nr_pages) { From patchwork Thu Nov 21 07:13:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255799 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 696E7930 for ; Thu, 21 Nov 2019 07:18:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 489AA2089F for ; Thu, 21 Nov 2019 07:18:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="Th24DY3G" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727080AbfKUHN6 (ORCPT ); Thu, 21 Nov 2019 02:13:58 -0500 Received: from hqemgate14.nvidia.com ([216.228.121.143]:13011 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727047AbfKUHN6 (ORCPT ); Thu, 21 Nov 2019 02:13:58 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:14:00 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:57 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 20 Nov 2019 23:13:57 -0800 Received: from HQMAIL107.nvidia.com (172.20.187.13) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:56 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:56 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:56 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH v7 16/24] net/xdp: set FOLL_PIN via pin_user_pages() Date: Wed, 20 Nov 2019 23:13:46 -0800 Message-ID: <20191121071354.456618-17-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320440; bh=JiuMSIu1KROGnYN0FUrQmmZholMbEQnfhCSo9iAG4Tw=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Type:Content-Transfer-Encoding; b=Th24DY3GgZ75uiI4k8eyekw8SqOYVAgo8mkdD15klVZLgr85wzWmBlE45VW2Q+jaw GJLk0eKwHeOVTk7tjo1SrL1hu5mMioMCqX3yE3pppHj/+lh6VRqCDyoN8q3uh0/lIY YN4xX4hQ+7xAzgoaiLW0pJ9jkT03NtwLD48eohqa4K3t7iUHhGwwjW7Bkn1heUBDHn aR7Sybhw/gXzg8T4sEbjLmRKJkI4O3F6an6hvwrz1G87Exsl697qnVG/wWuOVCrr+j AT1Cle5kUpRouL5KhRurfcYlPJtgs/ZErvkOYAr0MGM8UDJlvEbAl9bUXTTa9er1GO u1Gfh1G3pEEUg== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Convert net/xdp to use the new pin_longterm_pages() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages. In partial anticipation of this work, the net/xdp code was already calling put_user_page() instead of put_page(). Therefore, in order to convert from the get_user_pages()/put_page() model, to the pin_user_pages()/put_user_page() model, the only change required here is to change get_user_pages() to pin_user_pages(). Acked-by: Björn Töpel Signed-off-by: John Hubbard --- net/xdp/xdp_umem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 3049af269fbf..d071003b5e76 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -291,7 +291,7 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem) return -ENOMEM; down_read(¤t->mm->mmap_sem); - npgs = get_user_pages(umem->address, umem->npgs, + npgs = pin_user_pages(umem->address, umem->npgs, gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL); up_read(¤t->mm->mmap_sem); From patchwork Thu Nov 21 07:13:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255553 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4C39B186D for ; Thu, 21 Nov 2019 07:15:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 102B3208C0 for ; Thu, 21 Nov 2019 07:15:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="Sk4m7B4K" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727372AbfKUHOL (ORCPT ); Thu, 21 Nov 2019 02:14:11 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:4443 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727355AbfKUHOK (ORCPT ); Thu, 21 Nov 2019 02:14:10 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:58 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:57 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 20 Nov 2019 23:13:57 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:57 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:56 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:56 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH v7 17/24] mm/gup: track FOLL_PIN pages Date: Wed, 20 Nov 2019 23:13:47 -0800 Message-ID: <20191121071354.456618-18-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320438; bh=/mF69U5Ee55lNMclEzShGWvimdKwuQUtkBEknX8WpR0=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Type:Content-Transfer-Encoding; b=Sk4m7B4Kiqq/Qu/Ve+OyweWZ6ZplWOdAlLz+JFpswpkkE+ulwAZiYHr8Y//QyvLh7 dJDhBr+IZsbdQmp/ePuQR+1JrzlpcON2HFCuuV6AjmhQEDMvbtnTACqhiy/s9g9mh7 ywSuskVecFJPEj7YkDf0KLgWx4MF5F8vqWc94R8jBMdZ0k7LsjxU+Rp+JhZYB9itYi arcKQR3dv/7r3MQlhlLwCCr+OT6oUaPyV0gDYvv1RIjdGk3p4kWLROlBHlKXcE+Wux s+hAezzRLeYsF8d/Tz+Sn4tUx9Lmzam0WQROHmWopW9zA9Cc+ZdKuupNFH3Rv4IPRp HTQrFOmeiCehA== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Add tracking of pages that were pinned via FOLL_PIN. As mentioned in the FOLL_PIN documentation, callers who effectively set FOLL_PIN are required to ultimately free such pages via put_user_page(). The effect is similar to FOLL_GET, and may be thought of as "FOLL_GET for DIO and/or RDMA use". Pages that have been pinned via FOLL_PIN are identifiable via a new function call: bool page_dma_pinned(struct page *page); What to do in response to encountering such a page, is left to later patchsets. There is discussion about this in [1], [2], and [3]. This also changes a BUG_ON(), to a WARN_ON(), in follow_page_mask(). [1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/ [2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/ [3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/ Suggested-by: Jan Kara Suggested-by: Jérôme Glisse Signed-off-by: John Hubbard Reported-by: kbuild test robot --- Documentation/core-api/pin_user_pages.rst | 2 +- include/linux/mm.h | 113 +++++++- include/linux/mmzone.h | 2 + include/linux/page_ref.h | 10 + mm/gup.c | 323 ++++++++++++++++------ mm/huge_memory.c | 44 ++- mm/hugetlb.c | 36 ++- mm/vmstat.c | 2 + 8 files changed, 421 insertions(+), 111 deletions(-) diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst index 4f26637a5005..baa288a44a77 100644 --- a/Documentation/core-api/pin_user_pages.rst +++ b/Documentation/core-api/pin_user_pages.rst @@ -53,7 +53,7 @@ Which flags are set by each wrapper For these pin_user_pages*() functions, FOLL_PIN is OR'd in with whatever gup flags the caller provides. The caller is required to pass in a non-null struct pages* array, and the function then pin pages by incrementing each by a special -value. For now, that value is +1, just like get_user_pages*().:: +value: GUP_PIN_COUNTING_BIAS.:: Function -------- diff --git a/include/linux/mm.h b/include/linux/mm.h index 568cbb895f03..253ec2d15f36 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1054,6 +1054,21 @@ static inline __must_check bool try_get_page(struct page *page) return true; } +__must_check bool try_pin_compound_head(struct page *page, int refs); + +/** + * try_pin_page() - mark a page as being used by pin_user_pages*(). + * + * This is the FOLL_PIN counterpart to try_get_page(). + * + * @page: pointer to page to be marked + * @Return: true for success, false for failure + */ +static inline __must_check bool try_pin_page(struct page *page) +{ + return try_pin_compound_head(page, 1); +} + static inline void put_page(struct page *page) { page = compound_head(page); @@ -1071,29 +1086,70 @@ static inline void put_page(struct page *page) __put_page(page); } -/** - * put_user_page() - release a gup-pinned page - * @page: pointer to page to be released +/* + * GUP_PIN_COUNTING_BIAS, and the associated functions that use it, overload + * the page's refcount so that two separate items are tracked: the original page + * reference count, and also a new count of how many pin_user_pages() calls were + * made against the page. ("gup-pinned" is another term for the latter). * - * Pages that were pinned via pin_user_pages*() must be released via either - * put_user_page(), or one of the put_user_pages*() routines. This is so that - * eventually such pages can be separately tracked and uniquely handled. In - * particular, interactions with RDMA and filesystems need special handling. + * With this scheme, pin_user_pages() becomes special: such pages are marked as + * distinct from normal pages. As such, the put_user_page() call (and its + * variants) must be used in order to release gup-pinned pages. * - * put_user_page() and put_page() are not interchangeable, despite this early - * implementation that makes them look the same. put_user_page() calls must - * be perfectly matched up with pin*() calls. + * Choice of value: + * + * By making GUP_PIN_COUNTING_BIAS a power of two, debugging of page reference + * counts with respect to pin_user_pages() and put_user_page() becomes simpler, + * due to the fact that adding an even power of two to the page refcount has the + * effect of using only the upper N bits, for the code that counts up using the + * bias value. This means that the lower bits are left for the exclusive use of + * the original code that increments and decrements by one (or at least, by much + * smaller values than the bias value). + * + * Of course, once the lower bits overflow into the upper bits (and this is + * OK, because subtraction recovers the original values), then visual inspection + * no longer suffices to directly view the separate counts. However, for normal + * applications that don't have huge page reference counts, this won't be an + * issue. + * + * Locking: the lockless algorithm described in page_cache_get_speculative() + * and page_cache_gup_pin_speculative() provides safe operation for + * get_user_pages and page_mkclean and other calls that race to set up page + * table entries. */ -static inline void put_user_page(struct page *page) -{ - put_page(page); -} +#define GUP_PIN_COUNTING_BIAS (1UL << 10) +void put_user_page(struct page *page); void put_user_pages_dirty_lock(struct page **pages, unsigned long npages, bool make_dirty); - void put_user_pages(struct page **pages, unsigned long npages); +/** + * page_dma_pinned() - report if a page is pinned for DMA. + * + * This function checks if a page has been pinned via a call to + * pin_user_pages*(). + * + * The return value is partially fuzzy: false is not fuzzy, because it means + * "definitely not pinned for DMA", but true means "probably pinned for DMA, but + * possibly a false positive due to having at least GUP_PIN_COUNTING_BIAS worth + * of normal page references". + * + * False positives are OK, because: a) it's unlikely for a page to get that many + * refcounts, and b) all the callers of this routine are expected to be able to + * deal gracefully with a false positive. + * + * For more information, please see Documentation/vm/pin_user_pages.rst. + * + * @page: pointer to page to be queried. + * @Return: True, if it is likely that the page has been "dma-pinned". + * False, if the page is definitely not dma-pinned. + */ +static inline bool page_dma_pinned(struct page *page) +{ + return (page_ref_count(compound_head(page))) >= GUP_PIN_COUNTING_BIAS; +} + #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define SECTION_IN_PAGE_FLAGS #endif @@ -2675,6 +2731,33 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, * Please see Documentation/vm/pin_user_pages.rst for more information. */ +/** + * grab_page() - elevate a page's refcount by a flag-dependent amount + * + * @page: pointer to page to be grabbed + * @flags: gup flags: these are the FOLL_* flag values. + * + * Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same + * time. (That's true throughout the get_user_pages*() and pin_user_pages*() + * APIs.) Cases: + * + * FOLL_GET: page's refcount will be incremented by 1. + * FOLL_PIN: page's refcount will be incremented by GUP_PIN_COUNTING_BIAS. + * + * @Return: true for for success, false for failure. If neither FOLL_GET nor + * FOLL_PIN is set, that's also considered "success", even though nothing is + * done: no page refcount changes are made in that case. + */ +static inline bool grab_page(struct page *page, unsigned int flags) +{ + if (flags & FOLL_GET) + get_page(page); + else if (flags & FOLL_PIN) + return try_pin_page(page); + + return true; +} + static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags) { if (vm_fault & VM_FAULT_OOM) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index bda20282746b..0485cba38d23 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -244,6 +244,8 @@ enum node_stat_item { NR_DIRTIED, /* page dirtyings since bootup */ NR_WRITTEN, /* page writings since bootup */ NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */ + NR_FOLL_PIN_REQUESTED, /* via: pin_user_page(), gup flag: FOLL_PIN */ + NR_FOLL_PIN_RETURNED, /* pages returned via put_user_page() */ NR_VM_NODE_STAT_ITEMS }; diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h index 14d14beb1f7f..b9cbe553d1e7 100644 --- a/include/linux/page_ref.h +++ b/include/linux/page_ref.h @@ -102,6 +102,16 @@ static inline void page_ref_sub(struct page *page, int nr) __page_ref_mod(page, -nr); } +static inline int page_ref_sub_return(struct page *page, int nr) +{ + int ret = atomic_sub_return(nr, &page->_refcount); + + if (page_ref_tracepoint_active(__tracepoint_page_ref_mod)) + __page_ref_mod(page, -nr); + + return ret; +} + static inline void page_ref_inc(struct page *page) { atomic_inc(&page->_refcount); diff --git a/mm/gup.c b/mm/gup.c index f72d7a1635b4..b06434185e33 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -51,6 +51,96 @@ static inline struct page *try_get_compound_head(struct page *page, int refs) return head; } +#ifdef CONFIG_DEBUG_VM +static inline void __update_proc_vmstat(struct page *page, + enum node_stat_item item, int count) +{ + mod_node_page_state(page_pgdat(page), item, count); +} +#else +static inline void __update_proc_vmstat(struct page *page, + enum node_stat_item item, int count) +{ +} +#endif + +/** + * try_pin_compound_head() - mark a compound page as being used by + * pin_user_pages*(). + * + * This is the FOLL_PIN counterpart to try_get_compound_head(). + * + * @page: pointer to page to be marked + * @Return: true for success, false for failure + */ +__must_check bool try_pin_compound_head(struct page *page, int refs) +{ + page = try_get_compound_head(page, GUP_PIN_COUNTING_BIAS * refs); + if (!page) + return false; + + __update_proc_vmstat(page, NR_FOLL_PIN_REQUESTED, refs); + return true; +} + +#ifdef CONFIG_DEV_PAGEMAP_OPS +static bool __put_devmap_managed_user_page(struct page *page) +{ + bool is_devmap = page_is_devmap_managed(page); + + if (is_devmap) { + int count = page_ref_sub_return(page, GUP_PIN_COUNTING_BIAS); + + __update_proc_vmstat(page, NR_FOLL_PIN_RETURNED, 1); + /* + * devmap page refcounts are 1-based, rather than 0-based: if + * refcount is 1, then the page is free and the refcount is + * stable because nobody holds a reference on the page. + */ + if (count == 1) + free_devmap_managed_page(page); + else if (!count) + __put_page(page); + } + + return is_devmap; +} +#else +static bool __put_devmap_managed_user_page(struct page *page) +{ + return false; +} +#endif /* CONFIG_DEV_PAGEMAP_OPS */ + +/** + * put_user_page() - release a dma-pinned page + * @page: pointer to page to be released + * + * Pages that were pinned via pin_user_pages*() must be released via either + * put_user_page(), or one of the put_user_pages*() routines. This is so that + * such pages can be separately tracked and uniquely handled. In particular, + * interactions with RDMA and filesystems need special handling. + */ +void put_user_page(struct page *page) +{ + page = compound_head(page); + + /* + * For devmap managed pages we need to catch refcount transition from + * GUP_PIN_COUNTING_BIAS to 1, when refcount reach one it means the + * page is free and we need to inform the device driver through + * callback. See include/linux/memremap.h and HMM for details. + */ + if (__put_devmap_managed_user_page(page)) + return; + + if (page_ref_sub_and_test(page, GUP_PIN_COUNTING_BIAS)) + __put_page(page); + + __update_proc_vmstat(page, NR_FOLL_PIN_RETURNED, 1); +} +EXPORT_SYMBOL(put_user_page); + /** * put_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages * @pages: array of pages to be maybe marked dirty, and definitely released. @@ -237,10 +327,11 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, } page = vm_normal_page(vma, address, pte); - if (!page && pte_devmap(pte) && (flags & FOLL_GET)) { + if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) { /* - * Only return device mapping pages in the FOLL_GET case since - * they are only valid while holding the pgmap reference. + * Only return device mapping pages in the FOLL_GET or FOLL_PIN + * case since they are only valid while holding the pgmap + * reference. */ *pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap); if (*pgmap) @@ -283,6 +374,11 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, page = ERR_PTR(-ENOMEM); goto out; } + } else if (flags & FOLL_PIN) { + if (unlikely(!try_pin_page(page))) { + page = ERR_PTR(-ENOMEM); + goto out; + } } if (flags & FOLL_TOUCH) { if ((flags & FOLL_WRITE) && @@ -544,8 +640,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, /* make this handle hugepd */ page = follow_huge_addr(mm, address, flags & FOLL_WRITE); if (!IS_ERR(page)) { - BUG_ON(flags & FOLL_GET); - return page; + WARN_ON_ONCE(flags & (FOLL_GET | FOLL_PIN)); + return NULL; } pgd = pgd_offset(mm, address); @@ -1125,6 +1221,36 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, return pages_done; } +static long __get_user_pages_remote(struct task_struct *tsk, + struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas, int *locked) +{ + /* + * Parts of FOLL_LONGTERM behavior are incompatible with + * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on + * vmas. However, this only comes up if locked is set, and there are + * callers that do request FOLL_LONGTERM, but do not set locked. So, + * allow what we can. + */ + if (gup_flags & FOLL_LONGTERM) { + if (WARN_ON_ONCE(locked)) + return -EINVAL; + /* + * This will check the vmas (even if our vmas arg is NULL) + * and return -ENOTSUPP if DAX isn't allowed in this case: + */ + return __gup_longterm_locked(tsk, mm, start, nr_pages, pages, + vmas, gup_flags | FOLL_TOUCH | + FOLL_REMOTE); + } + + return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, + locked, + gup_flags | FOLL_TOUCH | FOLL_REMOTE); +} + /* * get_user_pages_remote() - pin user pages in memory * @tsk: the task_struct to use for page fault accounting, or @@ -1193,28 +1319,8 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, if (WARN_ON_ONCE(gup_flags & FOLL_PIN)) return -EINVAL; - /* - * Parts of FOLL_LONGTERM behavior are incompatible with - * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on - * vmas. However, this only comes up if locked is set, and there are - * callers that do request FOLL_LONGTERM, but do not set locked. So, - * allow what we can. - */ - if (gup_flags & FOLL_LONGTERM) { - if (WARN_ON_ONCE(locked)) - return -EINVAL; - /* - * This will check the vmas (even if our vmas arg is NULL) - * and return -ENOTSUPP if DAX isn't allowed in this case: - */ - return __gup_longterm_locked(tsk, mm, start, nr_pages, pages, - vmas, gup_flags | FOLL_TOUCH | - FOLL_REMOTE); - } - - return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, - locked, - gup_flags | FOLL_TOUCH | FOLL_REMOTE); + return __get_user_pages_remote(tsk, mm, start, nr_pages, gup_flags, + pages, vmas, locked); } EXPORT_SYMBOL(get_user_pages_remote); @@ -1842,13 +1948,17 @@ static inline pte_t gup_get_pte(pte_t *ptep) #endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */ static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start, + unsigned int flags, struct page **pages) { while ((*nr) - nr_start) { struct page *page = pages[--(*nr)]; ClearPageReferenced(page); - put_page(page); + if (flags & FOLL_PIN) + put_user_page(page); + else + put_page(page); } } @@ -1881,7 +1991,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, pgmap = get_dev_pagemap(pte_pfn(pte), pgmap); if (unlikely(!pgmap)) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(nr, nr_start, flags, pages); goto pte_unmap; } } else if (pte_special(pte)) @@ -1890,9 +2000,15 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); - head = try_get_compound_head(page, 1); - if (!head) - goto pte_unmap; + if (flags & FOLL_PIN) { + head = page; + if (unlikely(!try_pin_page(head))) + goto pte_unmap; + } else { + head = try_get_compound_head(page, 1); + if (!head) + goto pte_unmap; + } if (unlikely(pte_val(pte) != pte_val(*ptep))) { put_page(head); @@ -1946,12 +2062,20 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, pgmap = get_dev_pagemap(pfn, pgmap); if (unlikely(!pgmap)) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(nr, nr_start, flags, pages); return 0; } SetPageReferenced(page); pages[*nr] = page; - get_page(page); + + if (flags & FOLL_PIN) { + if (unlikely(!try_pin_page(page))) { + undo_dev_pagemap(nr, nr_start, flags, pages); + return 0; + } + } else + get_page(page); + (*nr)++; pfn++; } while (addr += PAGE_SIZE, addr != end); @@ -1973,7 +2097,7 @@ static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, return 0; if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(nr, nr_start, flags, pages); return 0; } return 1; @@ -1991,7 +2115,7 @@ static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, return 0; if (unlikely(pud_val(orig) != pud_val(*pudp))) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(nr, nr_start, flags, pages); return 0; } return 1; @@ -2014,8 +2138,8 @@ static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr, } #endif -static int __record_subpages(struct page *page, unsigned long addr, - unsigned long end, struct page **pages) +static int record_subpages(struct page *page, unsigned long addr, + unsigned long end, struct page **pages) { int nr; @@ -2025,12 +2149,31 @@ static int __record_subpages(struct page *page, unsigned long addr, return nr; } -static void put_compound_head(struct page *page, int refs) +static bool grab_compound_head(struct page *head, int refs, unsigned int flags) { + if (flags & FOLL_PIN) { + if (unlikely(!try_pin_compound_head(head, refs))) + return false; + } else { + head = try_get_compound_head(head, refs); + if (!head) + return false; + } + + return true; +} + +static void put_compound_head(struct page *page, int refs, unsigned int flags) +{ + struct page *head = compound_head(page); + + if (flags & FOLL_PIN) + refs *= GUP_PIN_COUNTING_BIAS; + /* Do a get_page() first, in case refs == page->_refcount */ - get_page(page); - page_ref_sub(page, refs); - put_page(page); + get_page(head); + page_ref_sub(head, refs); + put_page(head); } #ifdef CONFIG_ARCH_HAS_HUGEPD @@ -2064,14 +2207,13 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, head = pte_page(pte); page = head + ((addr & (sz-1)) >> PAGE_SHIFT); - refs = __record_subpages(page, addr, end, pages + *nr); + refs = record_subpages(page, addr, end, pages + *nr); - head = try_get_compound_head(head, refs); - if (!head) + if (!grab_compound_head(head, refs, flags)) return 0; if (unlikely(pte_val(pte) != pte_val(*ptep))) { - put_compound_head(head, refs); + put_compound_head(head, refs, flags); return 0; } @@ -2124,14 +2266,14 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, } page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - refs = __record_subpages(page, addr, end, pages + *nr); + refs = record_subpages(page, addr, end, pages + *nr); - head = try_get_compound_head(pmd_page(orig), refs); - if (!head) + head = pmd_page(orig); + if (!grab_compound_head(head, refs, flags)) return 0; if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { - put_compound_head(head, refs); + put_compound_head(head, refs, flags); return 0; } @@ -2158,14 +2300,14 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, } page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); - refs = __record_subpages(page, addr, end, pages + *nr); + refs = record_subpages(page, addr, end, pages + *nr); - head = try_get_compound_head(pud_page(orig), refs); - if (!head) + head = pud_page(orig); + if (!grab_compound_head(head, refs, flags)) return 0; if (unlikely(pud_val(orig) != pud_val(*pudp))) { - put_compound_head(head, refs); + put_compound_head(head, refs, flags); return 0; } @@ -2187,14 +2329,14 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, BUILD_BUG_ON(pgd_devmap(orig)); page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT); - refs = __record_subpages(page, addr, end, pages + *nr); + refs = record_subpages(page, addr, end, pages + *nr); - head = try_get_compound_head(pgd_page(orig), refs); - if (!head) + head = pgd_page(orig); + if (!grab_compound_head(head, refs, flags)) return 0; if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) { - put_compound_head(head, refs); + put_compound_head(head, refs, flags); return 0; } @@ -2494,9 +2636,12 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast); /** * pin_user_pages_fast() - pin user pages in memory without taking locks * - * For now, this is a placeholder function, until various call sites are - * converted to use the correct get_user_pages*() or pin_user_pages*() API. So, - * this is identical to get_user_pages_fast(). + * Nearly the same as get_user_pages_fast(), except that FOLL_PIN is set. See + * get_user_pages_fast() for documentation on the function arguments, because + * the arguments here are identical. + * + * FOLL_PIN means that the pages must be released via unpin_user_page(). Please + * see Documentation/vm/pin_user_pages.rst for further details. * * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It * is NOT intended for Case 2 (RDMA: long-term pins). @@ -2504,21 +2649,24 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast); int pin_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages) { - /* - * This is a placeholder, until the pin functionality is activated. - * Until then, just behave like the corresponding get_user_pages*() - * routine. - */ - return get_user_pages_fast(start, nr_pages, gup_flags, pages); + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE(gup_flags & FOLL_GET)) + return -EINVAL; + + gup_flags |= FOLL_PIN; + return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages); } EXPORT_SYMBOL_GPL(pin_user_pages_fast); /** * pin_user_pages_remote() - pin pages of a remote process (task != current) * - * For now, this is a placeholder function, until various call sites are - * converted to use the correct get_user_pages*() or pin_user_pages*() API. So, - * this is identical to get_user_pages_remote(). + * Nearly the same as get_user_pages_remote(), except that FOLL_PIN is set. See + * get_user_pages_remote() for documentation on the function arguments, because + * the arguments here are identical. + * + * FOLL_PIN means that the pages must be released via unpin_user_page(). Please + * see Documentation/vm/pin_user_pages.rst for details. * * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It * is NOT intended for Case 2 (RDMA: long-term pins). @@ -2528,22 +2676,24 @@ long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas, int *locked) { - /* - * This is a placeholder, until the pin functionality is activated. - * Until then, just behave like the corresponding get_user_pages*() - * routine. - */ - return get_user_pages_remote(tsk, mm, start, nr_pages, gup_flags, pages, - vmas, locked); + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE(gup_flags & FOLL_GET)) + return -EINVAL; + + gup_flags |= FOLL_PIN; + return __get_user_pages_remote(tsk, mm, start, nr_pages, gup_flags, + pages, vmas, locked); } EXPORT_SYMBOL(pin_user_pages_remote); /** * pin_user_pages() - pin user pages in memory for use by other devices * - * For now, this is a placeholder function, until various call sites are - * converted to use the correct get_user_pages*() or pin_user_pages*() API. So, - * this is identical to get_user_pages(). + * Nearly the same as get_user_pages(), except that FOLL_TOUCH is not set, and + * FOLL_PIN is set. + * + * FOLL_PIN means that the pages must be released via unpin_user_page(). Please + * see Documentation/vm/pin_user_pages.rst for details. * * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It * is NOT intended for Case 2 (RDMA: long-term pins). @@ -2552,11 +2702,12 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas) { - /* - * This is a placeholder, until the pin functionality is activated. - * Until then, just behave like the corresponding get_user_pages*() - * routine. - */ - return get_user_pages(start, nr_pages, gup_flags, pages, vmas); + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE(gup_flags & FOLL_GET)) + return -EINVAL; + + gup_flags |= FOLL_PIN; + return __gup_longterm_locked(current, current->mm, start, nr_pages, + pages, vmas, gup_flags); } EXPORT_SYMBOL(pin_user_pages); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 13cc93785006..981a9ea0b83f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -945,6 +945,11 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, */ WARN_ONCE(flags & FOLL_COW, "mm: In follow_devmap_pmd with FOLL_COW set"); + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == + (FOLL_PIN | FOLL_GET))) + return NULL; + if (flags & FOLL_WRITE && !pmd_write(*pmd)) return NULL; @@ -960,7 +965,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, * device mapped pages can only be returned if the * caller will manage the page reference count. */ - if (!(flags & FOLL_GET)) + if (!(flags & (FOLL_GET | FOLL_PIN))) return ERR_PTR(-EEXIST); pfn += (addr & ~PMD_MASK) >> PAGE_SHIFT; @@ -968,7 +973,14 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, if (!*pgmap) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); - get_page(page); + + /* + * grab_page() is not actually expected to fail here because we hold the + * pmd lock, so no one can unmap the pmd and free the page that it + * points to. + */ + if (!grab_page(page, flags)) + page = ERR_PTR(-EFAULT); return page; } @@ -1088,6 +1100,11 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, if (flags & FOLL_WRITE && !pud_write(*pud)) return NULL; + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == + (FOLL_PIN | FOLL_GET))) + return NULL; + if (pud_present(*pud) && pud_devmap(*pud)) /* pass */; else @@ -1099,8 +1116,10 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, /* * device mapped pages can only be returned if the * caller will manage the page reference count. + * + * At least one of FOLL_GET | FOLL_PIN must be set, so assert that here: */ - if (!(flags & FOLL_GET)) + if (!(flags & (FOLL_GET | FOLL_PIN))) return ERR_PTR(-EEXIST); pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT; @@ -1108,7 +1127,14 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, if (!*pgmap) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); - get_page(page); + + /* + * grab_page() is not actually expected to fail here because we hold the + * pmd lock, so no one can unmap the pmd and free the page that it + * points to. + */ + if (!grab_page(page, flags)) + page = ERR_PTR(-EFAULT); return page; } @@ -1522,8 +1548,14 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, skip_mlock: page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page); - if (flags & FOLL_GET) - get_page(page); + + /* + * grab_page() is not actually expected to fail here because we hold the + * pmd lock, so no one can unmap the pmd and free the page that it + * points to. + */ + if (!grab_page(page, flags)) + page = ERR_PTR(-EFAULT); out: return page; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b45a95363a84..eac3310d62f5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4462,7 +4462,19 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, same_page: if (pages) { pages[i] = mem_map_offset(page, pfn_offset); - get_page(pages[i]); + + /* + * grab_page() is not actually expected to fail here + * because we hold the pmd lock, so no one can unmap the + * pmd and free the page that it points to. + */ + if (!grab_page(pages[i], flags)) { + spin_unlock(ptl); + remainder = 0; + err = -ENOMEM; + WARN_ON_ONCE(1); + break; + } } if (vmas) @@ -5022,6 +5034,12 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address, struct page *page = NULL; spinlock_t *ptl; pte_t pte; + + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == + (FOLL_PIN | FOLL_GET))) + return NULL; + retry: ptl = pmd_lockptr(mm, pmd); spin_lock(ptl); @@ -5034,8 +5052,20 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address, pte = huge_ptep_get((pte_t *)pmd); if (pte_present(pte)) { page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT); + if (flags & FOLL_GET) get_page(page); + else if (flags & FOLL_PIN) { + /* + * try_pin_page() is not actually expected to fail + * here because we hold the ptl. + */ + if (unlikely(!try_pin_page(page))) { + WARN_ON_ONCE(1); + page = NULL; + goto out; + } + } } else { if (is_hugetlb_entry_migration(pte)) { spin_unlock(ptl); @@ -5056,7 +5086,7 @@ struct page * __weak follow_huge_pud(struct mm_struct *mm, unsigned long address, pud_t *pud, int flags) { - if (flags & FOLL_GET) + if (flags & (FOLL_GET | FOLL_PIN)) return NULL; return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT); @@ -5065,7 +5095,7 @@ follow_huge_pud(struct mm_struct *mm, unsigned long address, struct page * __weak follow_huge_pgd(struct mm_struct *mm, unsigned long address, pgd_t *pgd, int flags) { - if (flags & FOLL_GET) + if (flags & (FOLL_GET | FOLL_PIN)) return NULL; return pte_page(*(pte_t *)pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT); diff --git a/mm/vmstat.c b/mm/vmstat.c index a8222041bd44..fdad40ccde7b 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1167,6 +1167,8 @@ const char * const vmstat_text[] = { "nr_dirtied", "nr_written", "nr_kernel_misc_reclaimable", + "nr_foll_pin_requested", + "nr_foll_pin_returned", /* enum writeback_stat_item counters */ "nr_dirty_threshold", From patchwork Thu Nov 21 07:13:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255489 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BF6C2186D for ; Thu, 21 Nov 2019 07:14:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A118C208CE for ; Thu, 21 Nov 2019 07:14:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="isoRBm62" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726529AbfKUHOZ (ORCPT ); Thu, 21 Nov 2019 02:14:25 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:4480 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727399AbfKUHOO (ORCPT ); Thu, 21 Nov 2019 02:14:14 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:58 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:57 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 20 Nov 2019 23:13:57 -0800 Received: from HQMAIL107.nvidia.com (172.20.187.13) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:57 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:57 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:56 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard , Hans Verkuil Subject: [PATCH v7 18/24] media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversion Date: Wed, 20 Nov 2019 23:13:48 -0800 Message-ID: <20191121071354.456618-19-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320438; bh=sXkMwo6nCTbLEEe6A0fE3HZE6drkGcT2bf/xg2miV08=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=isoRBm62akfQQn8ZChi5BClTVDiZwqENu4cnQwBu1u/ZNe2Xuz4mRZyKdKXVls6IO tB7GIHdeQhs1onG9n01Oo2E/Fh1pUl5X4xdgPYRhaVrHimWrfVVehzM0+KSR9A/LBp vYqIRixYkkq2UMaPDPO7t3q/IS1XHcNhmeyFq5KSvHmmqFYlkDAail3i95XQ21qKT0 Uhys3VHcNvasoN6fmLaSTthn+1WBXJ763xou69tO3rs0HoOXQEpIcYGmzoeh3AYi00 u/rSZrtnQLHjfHuGKYqPTBDl8cd/7uzCmXrHijARe+2sZ18z3gh2GxsL8Q6LQvefJ/ fMjtI/AGK+yBA== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org 1. Change v4l2 from get_user_pages() to pin_user_pages(). 2. Because all FOLL_PIN-acquired pages must be released via put_user_page(), also convert the put_page() call over to put_user_pages_dirty_lock(). Acked-by: Hans Verkuil Cc: Ira Weiny Signed-off-by: John Hubbard --- drivers/media/v4l2-core/videobuf-dma-sg.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c index 28262190c3ab..162a2633b1e3 100644 --- a/drivers/media/v4l2-core/videobuf-dma-sg.c +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c @@ -183,12 +183,12 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma, dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n", data, size, dma->nr_pages); - err = get_user_pages(data & PAGE_MASK, dma->nr_pages, + err = pin_user_pages(data & PAGE_MASK, dma->nr_pages, flags | FOLL_LONGTERM, dma->pages, NULL); if (err != dma->nr_pages) { dma->nr_pages = (err >= 0) ? err : 0; - dprintk(1, "get_user_pages: err=%d [%d]\n", err, + dprintk(1, "pin_user_pages: err=%d [%d]\n", err, dma->nr_pages); return err < 0 ? err : -EINVAL; } @@ -349,11 +349,8 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma) BUG_ON(dma->sglen); if (dma->pages) { - for (i = 0; i < dma->nr_pages; i++) { - if (dma->direction == DMA_FROM_DEVICE) - set_page_dirty_lock(dma->pages[i]); - put_page(dma->pages[i]); - } + put_user_pages_dirty_lock(dma->pages, dma->nr_pages, + dma->direction == DMA_FROM_DEVICE); kfree(dma->pages); dma->pages = NULL; } From patchwork Thu Nov 21 07:13:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255753 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0268E14E5 for ; Thu, 21 Nov 2019 07:18:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D0A152089D for ; Thu, 21 Nov 2019 07:18:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="UDv7hXGr" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726947AbfKUHRp (ORCPT ); Thu, 21 Nov 2019 02:17:45 -0500 Received: from hqemgate14.nvidia.com ([216.228.121.143]:13023 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727050AbfKUHN7 (ORCPT ); Thu, 21 Nov 2019 02:13:59 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:14:00 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:57 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 20 Nov 2019 23:13:57 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:57 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:57 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:57 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH v7 19/24] vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion Date: Wed, 20 Nov 2019 23:13:49 -0800 Message-ID: <20191121071354.456618-20-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320440; bh=nI3kBsm1YYmEZh4r0pGrnWMl68q+Rq4M7HG7pTJrYP4=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=UDv7hXGrBG/yd6ToTnVLxiTw9r1H4giBXZNVJ6DpNhyAl4KY/tyMaq+wFxnHOdMtV i6AkoGUNL3wEyFlwt9HbgsAyf/SniOPZK9HqqqTkGWvp95bKLP4ZrPVIbebeGhFVNg eSmLsDzCah1dUYnfTnfSryQIs4sk8ta+DUuaK7rV4th2Fx3ncB3J6jdaJrhY1OwcOK fgT1xS6ChSchMjYwgaCp4F9NxbbSvQLXsqEHs80t/lCoGAURk4yw10TZK1u8rEw5I4 fjDaykUkkaLbGldcR6iMqKlwWiFfTSZxzqAG1djSZcHsSLWbP7WLlivUWKovbw8neX NjXDeQTFV8djg== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org 1. Change vfio from get_user_pages_remote(), to pin_user_pages_remote(). 2. Because all FOLL_PIN-acquired pages must be released via put_user_page(), also convert the put_page() call over to put_user_pages_dirty_lock(). Note that this effectively changes the code's behavior in vfio_iommu_type1.c: put_pfn(): it now ultimately calls set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate. As Christoph Hellwig put it, "set_page_dirty() is only safe if we are dealing with a file backed page where we have reference on the inode it hangs off." [1] [1] https://lore.kernel.org/r/20190723153640.GB720@lst.de Cc: Alex Williamson Signed-off-by: John Hubbard --- drivers/vfio/vfio_iommu_type1.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index c7a111ad9975..18aa36b56896 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -327,9 +327,8 @@ static int put_pfn(unsigned long pfn, int prot) { if (!is_invalid_reserved_pfn(pfn)) { struct page *page = pfn_to_page(pfn); - if (prot & IOMMU_WRITE) - SetPageDirty(page); - put_page(page); + + put_user_pages_dirty_lock(&page, 1, prot & IOMMU_WRITE); return 1; } return 0; @@ -347,7 +346,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, flags |= FOLL_WRITE; down_read(&mm->mmap_sem); - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, + ret = pin_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, page, NULL, NULL); if (ret == 1) { *pfn = page_to_pfn(page[0]); From patchwork Thu Nov 21 07:13:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255693 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A3CD112B for ; Thu, 21 Nov 2019 07:17:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EC530208C0 for ; Thu, 21 Nov 2019 07:17:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="gyZm1ORN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726792AbfKUHQ5 (ORCPT ); Thu, 21 Nov 2019 02:16:57 -0500 Received: from hqemgate15.nvidia.com ([216.228.121.64]:17803 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727179AbfKUHOD (ORCPT ); Thu, 21 Nov 2019 02:14:03 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:54 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:57 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 20 Nov 2019 23:13:57 -0800 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:57 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:57 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:57 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH v7 20/24] powerpc: book3s64: convert to pin_user_pages() and put_user_page() Date: Wed, 20 Nov 2019 23:13:50 -0800 Message-ID: <20191121071354.456618-21-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320434; bh=IJJb9r+CEPs17zzIv7S9Yy1snV6WhIVmX2ftMMVKd1c=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=gyZm1ORNdWqyUXBSr+Yna0k2k//4YXWJXgDEB9EQLBFJPniHjnRPKBVeLivTcgvI3 D4NJYgOxxQBp6NFji5YRlWbAP2Mtmpt3rIRZVk1fEMTXNR3L0kLhiLXe4ijoMGh3ar DDt0rPeAKO880QHdhs+xMQ+vEDcZl7y7VvOnCtlAoR2B/ojcVM/dFl+MaSyttwTQQr j7b0SlxR+CLDHs/gqvO6cMaKN3CL6ntfQ3KtJgLjbz5kU6z6P/TcQA01fl8yRFvfjI Ay+gp/fYni8vweBh8jJDlb2J5fRs2a4VIKQEW5I13nv1PNtMzdg6Kh7c4CVP9XKTJw 9jyvn6+gl1xWA== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org 1. Convert from get_user_pages() to pin_user_pages(). 2. As required by pin_user_pages(), release these pages via put_user_page(). In this case, do so via put_user_pages_dirty_lock(). That has the side effect of calling set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate. As Christoph Hellwig put it, "set_page_dirty() is only safe if we are dealing with a file backed page where we have reference on the inode it hangs off." [1] 3. Release each page in mem->hpages[] (instead of mem->hpas[]), because that is the array that pin_longterm_pages() filled in. This is more accurate and should be a little safer from a maintenance point of view. [1] https://lore.kernel.org/r/20190723153640.GB720@lst.de Signed-off-by: John Hubbard --- arch/powerpc/mm/book3s64/iommu_api.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c index 56cc84520577..196383e8e5a9 100644 --- a/arch/powerpc/mm/book3s64/iommu_api.c +++ b/arch/powerpc/mm/book3s64/iommu_api.c @@ -103,7 +103,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, for (entry = 0; entry < entries; entry += chunk) { unsigned long n = min(entries - entry, chunk); - ret = get_user_pages(ua + (entry << PAGE_SHIFT), n, + ret = pin_user_pages(ua + (entry << PAGE_SHIFT), n, FOLL_WRITE | FOLL_LONGTERM, mem->hpages + entry, NULL); if (ret == n) { @@ -167,9 +167,8 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, return 0; free_exit: - /* free the reference taken */ - for (i = 0; i < pinned; i++) - put_page(mem->hpages[i]); + /* free the references taken */ + put_user_pages(mem->hpages, pinned); vfree(mem->hpas); kfree(mem); @@ -212,10 +211,9 @@ static void mm_iommu_unpin(struct mm_iommu_table_group_mem_t *mem) if (!page) continue; - if (mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY) - SetPageDirty(page); + put_user_pages_dirty_lock(&mem->hpages[i], 1, + MM_IOMMU_TABLE_GROUP_PAGE_DIRTY); - put_page(page); mem->hpas[i] = 0; } } From patchwork Thu Nov 21 07:13:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255587 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 43F96112B for ; Thu, 21 Nov 2019 07:15:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 243B7208A3 for ; Thu, 21 Nov 2019 07:15:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="ptiOn68Z" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726803AbfKUHP2 (ORCPT ); Thu, 21 Nov 2019 02:15:28 -0500 Received: from hqemgate15.nvidia.com ([216.228.121.64]:17869 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727305AbfKUHOH (ORCPT ); Thu, 21 Nov 2019 02:14:07 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:55 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:58 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 20 Nov 2019 23:13:58 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:58 +0000 Received: from HQMAIL107.nvidia.com (172.20.187.13) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:57 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:57 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:57 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , "Mauro Carvalho Chehab" , Michael Ellerman , Michal Hocko , Mike Kravetz , "Paul Mackerras" , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH v7 21/24] mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1" Date: Wed, 20 Nov 2019 23:13:51 -0800 Message-ID: <20191121071354.456618-22-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320435; bh=zwwITV4l72pt0LmhAqpHWuQbAhr06rUfgov5x98Cdpc=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=ptiOn68ZbhOBTQxUULjSoK1Oeqqj1jx7PExJLqqsJmpIYQQhI3CAGWYLOJrwAgmXL J8WFvW6lTRMtneoTZNFSlo1IUBCj1ShJrGne/GSVEvWlcUm7okWL2aYL2MP+u1D21M 3Z9bmKT4eoG2zFqfbiIRkm+Wl2iIH5NK4tAJJf/kOQOEiBcZbKra3HZB+gr7A2jGQI rDOA3VoS74L/SpxG5ogOPr+u08ABM3MXRj7Rrwy/xKnlQ9jBoyRzycFGasnBejzeMK cLqcUOnaPzOvLQo5LOtPgaXqey6ci4sZBNRmJ75gA7Y+HCYK3H3MjaGykxfSRnzv1h n1/9UhYAPrRnw== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Fix the gup benchmark flags to use the symbolic FOLL_WRITE, instead of a hard-coded "1" value. Also, clean up the filtering of gup flags a little, by just doing it once before issuing any of the get_user_pages*() calls. This makes it harder to overlook, instead of having little "gup_flags & 1" phrases in the function calls. Reviewed-by: Ira Weiny Signed-off-by: John Hubbard --- mm/gup_benchmark.c | 9 ++++++--- tools/testing/selftests/vm/gup_benchmark.c | 6 +++++- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c index 7dd602d7f8db..7fc44d25eca7 100644 --- a/mm/gup_benchmark.c +++ b/mm/gup_benchmark.c @@ -48,18 +48,21 @@ static int __gup_benchmark_ioctl(unsigned int cmd, nr = (next - addr) / PAGE_SIZE; } + /* Filter out most gup flags: only allow a tiny subset here: */ + gup->flags &= FOLL_WRITE; + switch (cmd) { case GUP_FAST_BENCHMARK: - nr = get_user_pages_fast(addr, nr, gup->flags & 1, + nr = get_user_pages_fast(addr, nr, gup->flags, pages + i); break; case GUP_LONGTERM_BENCHMARK: nr = get_user_pages(addr, nr, - (gup->flags & 1) | FOLL_LONGTERM, + gup->flags | FOLL_LONGTERM, pages + i, NULL); break; case GUP_BENCHMARK: - nr = get_user_pages(addr, nr, gup->flags & 1, pages + i, + nr = get_user_pages(addr, nr, gup->flags, pages + i, NULL); break; default: diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c index 485cf06ef013..389327e9b30a 100644 --- a/tools/testing/selftests/vm/gup_benchmark.c +++ b/tools/testing/selftests/vm/gup_benchmark.c @@ -18,6 +18,9 @@ #define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) #define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) +/* Just the flags we need, copied from mm.h: */ +#define FOLL_WRITE 0x01 /* check pte is writable */ + struct gup_benchmark { __u64 get_delta_usec; __u64 put_delta_usec; @@ -85,7 +88,8 @@ int main(int argc, char **argv) } gup.nr_pages_per_call = nr_pages; - gup.flags = write; + if (write) + gup.flags |= FOLL_WRITE; fd = open("/sys/kernel/debug/gup_benchmark", O_RDWR); if (fd == -1) From patchwork Thu Nov 21 07:13:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255581 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C16A414E5 for ; Thu, 21 Nov 2019 07:15:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 99C5520898 for ; Thu, 21 Nov 2019 07:15:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="OBYRSNb7" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726690AbfKUHPO (ORCPT ); Thu, 21 Nov 2019 02:15:14 -0500 Received: from hqemgate15.nvidia.com ([216.228.121.64]:17860 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727279AbfKUHOI (ORCPT ); Thu, 21 Nov 2019 02:14:08 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:54 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:58 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 20 Nov 2019 23:13:58 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:57 +0000 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:57 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:57 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:57 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , "Mauro Carvalho Chehab" , Michael Ellerman , Michal Hocko , Mike Kravetz , "Paul Mackerras" , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH v7 22/24] mm/gup_benchmark: support pin_user_pages() and related calls Date: Wed, 20 Nov 2019 23:13:52 -0800 Message-ID: <20191121071354.456618-23-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320434; bh=2VikNfsXV53SqX6j47kop4rTtKhfcK4SlvrERtv5mQ4=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=OBYRSNb7XvkjtCHQfyNXnCXDl0b23Ba1L7n3qnKSSYA8cwKX8I23T3VKw6rJ0d9Mv 294+RQqWq8IF3BtmkSZjyZT4KsT8zj4cIMD2hhGv4KG0W10lLklySApaY+IsnURCi1 AR19nJz0WdvTNUiAmKezg9RWiLeOsUXgTGlDJIpRItC2S5xZkeq7riaBnAACmZf4HR Km/cwqVEEljPZ0c1uzfmkMzXS0md1z8g+MuSWABaC9tu5nVsp/mmP+ITpgZuFqJ2zn xrD1TYMqvZsjJKh7Hhd6JRxiI/8IGF+V7LYf77tkMzU4xpFJXD5B7kbogQwRbTuDlq 7G4AzUgK/J9Jw== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Up until now, gup_benchmark supported testing of the following kernel functions: * get_user_pages(): via the '-U' command line option * get_user_pages_longterm(): via the '-L' command line option * get_user_pages_fast(): as the default (no options required) Add test coverage for the new corresponding pin_*() functions: * pin_user_pages(): via the '-c' command line option * pin_user_pages_fast(): via the '-b' command line option Also, add an option for clarity: '-u' for what is now (still) the default choice: get_user_pages_fast(). Also, for the commands that set FOLL_PIN, verify that the pages really are dma-pinned, via the new is_dma_pinned() routine. Those commands are: PIN_FAST_BENCHMARK : calls pin_user_pages_fast() PIN_BENCHMARK : calls pin_user_pages() In between the calls to pin_*() and put_user_pages(), check each page: if page_dma_pinned() returns false, then WARN and return. Do this outside of the benchmark timestamps, so that it doesn't affect reported times. Cc: Ira Weiny Signed-off-by: John Hubbard --- mm/gup_benchmark.c | 65 ++++++++++++++++++++-- tools/testing/selftests/vm/gup_benchmark.c | 15 ++++- 2 files changed, 74 insertions(+), 6 deletions(-) diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c index 7fc44d25eca7..1ac089ad815f 100644 --- a/mm/gup_benchmark.c +++ b/mm/gup_benchmark.c @@ -8,6 +8,8 @@ #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) #define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) #define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) +#define PIN_FAST_BENCHMARK _IOWR('g', 4, struct gup_benchmark) +#define PIN_BENCHMARK _IOWR('g', 5, struct gup_benchmark) struct gup_benchmark { __u64 get_delta_usec; @@ -19,6 +21,42 @@ struct gup_benchmark { __u64 expansion[10]; /* For future use */ }; +static void put_back_pages(int cmd, struct page **pages, unsigned long nr_pages) +{ + int i; + + switch (cmd) { + case GUP_FAST_BENCHMARK: + case GUP_LONGTERM_BENCHMARK: + case GUP_BENCHMARK: + for (i = 0; i < nr_pages; i++) + put_page(pages[i]); + break; + + case PIN_FAST_BENCHMARK: + case PIN_BENCHMARK: + put_user_pages(pages, nr_pages); + break; + } +} + +static void verify_dma_pinned(int cmd, struct page **pages, + unsigned long nr_pages) +{ + int i; + + switch (cmd) { + case PIN_FAST_BENCHMARK: + case PIN_BENCHMARK: + for (i = 0; i < nr_pages; i++) { + if (WARN(!page_dma_pinned(pages[i]), + "pages[%d] is NOT dma-pinned\n", i)) + break; + } + break; + } +} + static int __gup_benchmark_ioctl(unsigned int cmd, struct gup_benchmark *gup) { @@ -65,6 +103,14 @@ static int __gup_benchmark_ioctl(unsigned int cmd, nr = get_user_pages(addr, nr, gup->flags, pages + i, NULL); break; + case PIN_FAST_BENCHMARK: + nr = pin_user_pages_fast(addr, nr, gup->flags, + pages + i); + break; + case PIN_BENCHMARK: + nr = pin_user_pages(addr, nr, gup->flags, pages + i, + NULL); + break; default: return -1; } @@ -75,15 +121,22 @@ static int __gup_benchmark_ioctl(unsigned int cmd, } end_time = ktime_get(); + /* Shifting the meaning of nr_pages: now it is actual number pinned: */ + nr_pages = i; + gup->get_delta_usec = ktime_us_delta(end_time, start_time); gup->size = addr - gup->addr; + /* + * Take an un-benchmark-timed moment to verify DMA pinned + * state: print a warning if any non-dma-pinned pages are found: + */ + verify_dma_pinned(cmd, pages, nr_pages); + start_time = ktime_get(); - for (i = 0; i < nr_pages; i++) { - if (!pages[i]) - break; - put_page(pages[i]); - } + + put_back_pages(cmd, pages, nr_pages); + end_time = ktime_get(); gup->put_delta_usec = ktime_us_delta(end_time, start_time); @@ -101,6 +154,8 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd, case GUP_FAST_BENCHMARK: case GUP_LONGTERM_BENCHMARK: case GUP_BENCHMARK: + case PIN_FAST_BENCHMARK: + case PIN_BENCHMARK: break; default: return -EINVAL; diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c index 389327e9b30a..43b4dfe161a2 100644 --- a/tools/testing/selftests/vm/gup_benchmark.c +++ b/tools/testing/selftests/vm/gup_benchmark.c @@ -18,6 +18,10 @@ #define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) #define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) +/* Similar to above, but use FOLL_PIN instead of FOLL_GET. */ +#define PIN_FAST_BENCHMARK _IOWR('g', 4, struct gup_benchmark) +#define PIN_BENCHMARK _IOWR('g', 5, struct gup_benchmark) + /* Just the flags we need, copied from mm.h: */ #define FOLL_WRITE 0x01 /* check pte is writable */ @@ -40,8 +44,14 @@ int main(int argc, char **argv) char *file = "/dev/zero"; char *p; - while ((opt = getopt(argc, argv, "m:r:n:f:tTLUwSH")) != -1) { + while ((opt = getopt(argc, argv, "m:r:n:f:abtTLUuwSH")) != -1) { switch (opt) { + case 'a': + cmd = PIN_FAST_BENCHMARK; + break; + case 'b': + cmd = PIN_BENCHMARK; + break; case 'm': size = atoi(optarg) * MB; break; @@ -63,6 +73,9 @@ int main(int argc, char **argv) case 'U': cmd = GUP_BENCHMARK; break; + case 'u': + cmd = GUP_FAST_BENCHMARK; + break; case 'w': write = 1; break; From patchwork Thu Nov 21 07:13:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255681 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D591D930 for ; Thu, 21 Nov 2019 07:16:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B11F2208A1 for ; Thu, 21 Nov 2019 07:16:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="BN+PST3f" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727104AbfKUHQr (ORCPT ); Thu, 21 Nov 2019 02:16:47 -0500 Received: from hqemgate15.nvidia.com ([216.228.121.64]:17804 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727176AbfKUHOE (ORCPT ); Thu, 21 Nov 2019 02:14:04 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:13:54 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:58 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 20 Nov 2019 23:13:58 -0800 Received: from HQMAIL107.nvidia.com (172.20.187.13) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:57 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:57 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:57 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH v7 23/24] selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN coverage Date: Wed, 20 Nov 2019 23:13:53 -0800 Message-ID: <20191121071354.456618-24-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320434; bh=/x1TdCiN4AHV7riFBKV40yDsqqnISHzJ+pPbTJijsU0=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=BN+PST3ffquUSkPWq0lMQ+aWC4m7ax2ZfA1EE2z2CvN6OOI0TwbXYpoM35w8Ibstb LzDd4IqHmBpLXZWGxRMMs0/24c5N7Rs6yrgb8NKMwg/MFI/OIcmvEyTFoju6rxd2uM dEtO/MIYHV+8qy+u6nNBWdHAQPtYjQUheAdkCVNmg1li29PuH7askTSgXJlX4uQwh7 oH+6wkzLufN2cqs+gubaGFYZy9/w9e52yLdLNCOq7NfCtI06N1YDiTXZNikXYsLQDY C9IKh5qRitaKFp8t4Q4iIQJrfMohxxMopR1PjCgCAyWtnOA8k3vWVTktk3o6s7f1Ia SdcMg1CzlS3sA== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org It's good to have basic unit test coverage of the new FOLL_PIN behavior. Fortunately, the gup_benchmark unit test is extremely fast (a few milliseconds), so adding it the the run_vmtests suite is going to cause no noticeable change in running time. So, add two new invocations to run_vmtests: 1) Run gup_benchmark with normal get_user_pages(). 2) Run gup_benchmark with pin_user_pages(). This is much like the first call, except that it sets FOLL_PIN. Running these two in quick succession also provide a visual comparison of the running times, which is convenient. The new invocations are fairly early in the run_vmtests script, because with test suites, it's usually preferable to put the shorter, faster tests first, all other things being equal. Reviewed-by: Ira Weiny Signed-off-by: John Hubbard --- tools/testing/selftests/vm/run_vmtests | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests index 951c507a27f7..5043347397a6 100755 --- a/tools/testing/selftests/vm/run_vmtests +++ b/tools/testing/selftests/vm/run_vmtests @@ -104,6 +104,28 @@ echo "NOTE: The above hugetlb tests provide minimal coverage. Use" echo " https://github.com/libhugetlbfs/libhugetlbfs.git for" echo " hugetlb regression testing." +echo "--------------------------------------------" +echo "running 'gup_benchmark -U' (normal/slow gup)" +echo "--------------------------------------------" +./gup_benchmark -U +if [ $? -ne 0 ]; then + echo "[FAIL]" + exitcode=1 +else + echo "[PASS]" +fi + +echo "------------------------------------------" +echo "running gup_benchmark -b (pin_user_pages)" +echo "------------------------------------------" +./gup_benchmark -b +if [ $? -ne 0 ]; then + echo "[FAIL]" + exitcode=1 +else + echo "[PASS]" +fi + echo "-------------------" echo "running userfaultfd" echo "-------------------" From patchwork Thu Nov 21 07:13:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 11255657 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AAFDD930 for ; Thu, 21 Nov 2019 07:16:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6B705208A1 for ; Thu, 21 Nov 2019 07:16:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="SbUQWB4H" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726541AbfKUHQ1 (ORCPT ); Thu, 21 Nov 2019 02:16:27 -0500 Received: from hqemgate14.nvidia.com ([216.228.121.143]:13100 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727242AbfKUHOH (ORCPT ); Thu, 21 Nov 2019 02:14:07 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 20 Nov 2019 23:14:01 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 20 Nov 2019 23:13:58 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 20 Nov 2019 23:13:58 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 07:13:57 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 07:13:57 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7,5,8,10121) id ; Wed, 20 Nov 2019 23:13:57 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , John Hubbard Subject: [PATCH v7 24/24] mm, tree-wide: rename put_user_page*() to unpin_user_page*() Date: Wed, 20 Nov 2019 23:13:54 -0800 Message-ID: <20191121071354.456618-25-jhubbard@nvidia.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191121071354.456618-1-jhubbard@nvidia.com> References: <20191121071354.456618-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1574320441; bh=JuszxZi+ydw4gHGCPcQJyQLM1fEBAIwayXEIKXFk0NE=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=SbUQWB4HVX+SvRZCvuezLgjDV4EBGIYtTowE21urk2g4Ajlebskycyq+OT1I8Cs6g 3bxiejTnZosBK0SLLDwXwW8kZ3kSXsZRfAAgGBysZ0DBIKzxRBG4BfTcOiZ/yTHXu0 npDUcMp6TpFoy/HAj8yaljA8cTQ75tcaNWMr+H5AQIrI07t5xP1LxNRtD8PcEJMkTM fxqAhjccXeL3i53zDtgUce43W2IcDXFQVgZjHfGkJud9ye3QX3tdQicsdn3LRyk/vu Fi+gpcl3O2dMpcvQLgObQFX4m+9r5TgNXn894MDGRdpqyLvuQJs5V75sbGKWZcdl24 MGARCgDaAXP5w== Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In order to provide a clearer, more symmetric API for pinning and unpinning DMA pages. This way, pin_user_pages*() calls match up with unpin_user_pages*() calls, and the API is a lot closer to being self-explanatory. Reviewed-by: Jan Kara Signed-off-by: John Hubbard --- Documentation/core-api/pin_user_pages.rst | 2 +- arch/powerpc/mm/book3s64/iommu_api.c | 6 +-- drivers/gpu/drm/via/via_dmablit.c | 4 +- drivers/infiniband/core/umem.c | 2 +- drivers/infiniband/hw/hfi1/user_pages.c | 2 +- drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +-- drivers/infiniband/hw/qib/qib_user_pages.c | 2 +- drivers/infiniband/hw/qib/qib_user_sdma.c | 6 +-- drivers/infiniband/hw/usnic/usnic_uiom.c | 2 +- drivers/infiniband/sw/siw/siw_mem.c | 2 +- drivers/media/v4l2-core/videobuf-dma-sg.c | 4 +- drivers/platform/goldfish/goldfish_pipe.c | 4 +- drivers/vfio/vfio_iommu_type1.c | 2 +- fs/io_uring.c | 4 +- include/linux/mm.h | 30 +++++++------- include/linux/mmzone.h | 2 +- mm/gup.c | 46 ++++++++++----------- mm/gup_benchmark.c | 2 +- mm/process_vm_access.c | 4 +- net/xdp/xdp_umem.c | 2 +- 20 files changed, 67 insertions(+), 67 deletions(-) diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst index baa288a44a77..6d93ef203561 100644 --- a/Documentation/core-api/pin_user_pages.rst +++ b/Documentation/core-api/pin_user_pages.rst @@ -220,7 +220,7 @@ since the system was booted, via two new /proc/vmstat entries: :: /proc/vmstat/nr_foll_pin_requested Those are both going to show zero, unless CONFIG_DEBUG_VM is set. This is -because there is a noticeable performance drop in put_user_page(), when they +because there is a noticeable performance drop in unpin_user_page(), when they are activated. References diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c index 196383e8e5a9..dd7aa5a4f33c 100644 --- a/arch/powerpc/mm/book3s64/iommu_api.c +++ b/arch/powerpc/mm/book3s64/iommu_api.c @@ -168,7 +168,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, free_exit: /* free the references taken */ - put_user_pages(mem->hpages, pinned); + unpin_user_pages(mem->hpages, pinned); vfree(mem->hpas); kfree(mem); @@ -211,8 +211,8 @@ static void mm_iommu_unpin(struct mm_iommu_table_group_mem_t *mem) if (!page) continue; - put_user_pages_dirty_lock(&mem->hpages[i], 1, - MM_IOMMU_TABLE_GROUP_PAGE_DIRTY); + unpin_user_pages_dirty_lock(&mem->hpages[i], 1, + MM_IOMMU_TABLE_GROUP_PAGE_DIRTY); mem->hpas[i] = 0; } diff --git a/drivers/gpu/drm/via/via_dmablit.c b/drivers/gpu/drm/via/via_dmablit.c index 37c5e572993a..719d036c9384 100644 --- a/drivers/gpu/drm/via/via_dmablit.c +++ b/drivers/gpu/drm/via/via_dmablit.c @@ -188,8 +188,8 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t *vsg) kfree(vsg->desc_pages); /* fall through */ case dr_via_pages_locked: - put_user_pages_dirty_lock(vsg->pages, vsg->num_pages, - (vsg->direction == DMA_FROM_DEVICE)); + unpin_user_pages_dirty_lock(vsg->pages, vsg->num_pages, + (vsg->direction == DMA_FROM_DEVICE)); /* fall through */ case dr_via_pages_alloc: vfree(vsg->pages); diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 2c287ced3439..119a147da904 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -54,7 +54,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) { page = sg_page_iter_page(&sg_iter); - put_user_pages_dirty_lock(&page, 1, umem->writable && dirty); + unpin_user_pages_dirty_lock(&page, 1, umem->writable && dirty); } sg_free_table(&umem->sg_head); diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c index 9a94761765c0..3b505006c0a6 100644 --- a/drivers/infiniband/hw/hfi1/user_pages.c +++ b/drivers/infiniband/hw/hfi1/user_pages.c @@ -118,7 +118,7 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np void hfi1_release_user_pages(struct mm_struct *mm, struct page **p, size_t npages, bool dirty) { - put_user_pages_dirty_lock(p, npages, dirty); + unpin_user_pages_dirty_lock(p, npages, dirty); if (mm) { /* during close after signal, mm can be NULL */ atomic64_sub(npages, &mm->pinned_vm); diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c index 8269ab040c21..78a48aea3faf 100644 --- a/drivers/infiniband/hw/mthca/mthca_memfree.c +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c @@ -482,7 +482,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar, ret = pci_map_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE); if (ret < 0) { - put_user_page(pages[0]); + unpin_user_page(pages[0]); goto out; } @@ -490,7 +490,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar, mthca_uarc_virt(dev, uar, i)); if (ret) { pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE); - put_user_page(sg_page(&db_tab->page[i].mem)); + unpin_user_page(sg_page(&db_tab->page[i].mem)); goto out; } @@ -556,7 +556,7 @@ void mthca_cleanup_user_db_tab(struct mthca_dev *dev, struct mthca_uar *uar, if (db_tab->page[i].uvirt) { mthca_UNMAP_ICM(dev, mthca_uarc_virt(dev, uar, i), 1); pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE); - put_user_page(sg_page(&db_tab->page[i].mem)); + unpin_user_page(sg_page(&db_tab->page[i].mem)); } } diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c index 7fc4b5f81fcd..342e3172ca40 100644 --- a/drivers/infiniband/hw/qib/qib_user_pages.c +++ b/drivers/infiniband/hw/qib/qib_user_pages.c @@ -40,7 +40,7 @@ static void __qib_release_user_pages(struct page **p, size_t num_pages, int dirty) { - put_user_pages_dirty_lock(p, num_pages, dirty); + unpin_user_pages_dirty_lock(p, num_pages, dirty); } /** diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c b/drivers/infiniband/hw/qib/qib_user_sdma.c index 1a3cc2957e3a..a67599b5a550 100644 --- a/drivers/infiniband/hw/qib/qib_user_sdma.c +++ b/drivers/infiniband/hw/qib/qib_user_sdma.c @@ -317,7 +317,7 @@ static int qib_user_sdma_page_to_frags(const struct qib_devdata *dd, * the caller can ignore this page. */ if (put) { - put_user_page(page); + unpin_user_page(page); } else { /* coalesce case */ kunmap(page); @@ -631,7 +631,7 @@ static void qib_user_sdma_free_pkt_frag(struct device *dev, kunmap(pkt->addr[i].page); if (pkt->addr[i].put_page) - put_user_page(pkt->addr[i].page); + unpin_user_page(pkt->addr[i].page); else __free_page(pkt->addr[i].page); } else if (pkt->addr[i].kvaddr) { @@ -706,7 +706,7 @@ static int qib_user_sdma_pin_pages(const struct qib_devdata *dd, /* if error, return all pages not managed by pkt */ free_pages: while (i < j) - put_user_page(pages[i++]); + unpin_user_page(pages[i++]); done: return ret; diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index 600896727d34..bd9f944b68fc 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -75,7 +75,7 @@ static void usnic_uiom_put_pages(struct list_head *chunk_list, int dirty) for_each_sg(chunk->page_list, sg, chunk->nents, i) { page = sg_page(sg); pa = sg_phys(sg); - put_user_pages_dirty_lock(&page, 1, dirty); + unpin_user_pages_dirty_lock(&page, 1, dirty); usnic_dbg("pa: %pa\n", &pa); } kfree(chunk); diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c index e53b07dcfed5..e2061dc0b043 100644 --- a/drivers/infiniband/sw/siw/siw_mem.c +++ b/drivers/infiniband/sw/siw/siw_mem.c @@ -63,7 +63,7 @@ struct siw_mem *siw_mem_id2obj(struct siw_device *sdev, int stag_index) static void siw_free_plist(struct siw_page_chunk *chunk, int num_pages, bool dirty) { - put_user_pages_dirty_lock(chunk->plist, num_pages, dirty); + unpin_user_pages_dirty_lock(chunk->plist, num_pages, dirty); } void siw_umem_release(struct siw_umem *umem, bool dirty) diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c index 162a2633b1e3..13b65ed9e74c 100644 --- a/drivers/media/v4l2-core/videobuf-dma-sg.c +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c @@ -349,8 +349,8 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma) BUG_ON(dma->sglen); if (dma->pages) { - put_user_pages_dirty_lock(dma->pages, dma->nr_pages, - dma->direction == DMA_FROM_DEVICE); + unpin_user_pages_dirty_lock(dma->pages, dma->nr_pages, + dma->direction == DMA_FROM_DEVICE); kfree(dma->pages); dma->pages = NULL; } diff --git a/drivers/platform/goldfish/goldfish_pipe.c b/drivers/platform/goldfish/goldfish_pipe.c index 635a8bc1b480..bf523df2a90d 100644 --- a/drivers/platform/goldfish/goldfish_pipe.c +++ b/drivers/platform/goldfish/goldfish_pipe.c @@ -360,8 +360,8 @@ static int transfer_max_buffers(struct goldfish_pipe *pipe, *consumed_size = pipe->command_buffer->rw_params.consumed_size; - put_user_pages_dirty_lock(pipe->pages, pages_count, - !is_write && *consumed_size > 0); + unpin_user_pages_dirty_lock(pipe->pages, pages_count, + !is_write && *consumed_size > 0); mutex_unlock(&pipe->lock); return 0; diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 18aa36b56896..c48ac1567f14 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -328,7 +328,7 @@ static int put_pfn(unsigned long pfn, int prot) if (!is_invalid_reserved_pfn(pfn)) { struct page *page = pfn_to_page(pfn); - put_user_pages_dirty_lock(&page, 1, prot & IOMMU_WRITE); + unpin_user_pages_dirty_lock(&page, 1, prot & IOMMU_WRITE); return 1; } return 0; diff --git a/fs/io_uring.c b/fs/io_uring.c index 15715eeebaec..0253a4d8fdc8 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -3328,7 +3328,7 @@ static int io_sqe_buffer_unregister(struct io_ring_ctx *ctx) struct io_mapped_ubuf *imu = &ctx->user_bufs[i]; for (j = 0; j < imu->nr_bvecs; j++) - put_user_page(imu->bvec[j].bv_page); + unpin_user_page(imu->bvec[j].bv_page); if (ctx->account_mem) io_unaccount_mem(ctx->user, imu->nr_bvecs); @@ -3473,7 +3473,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg, * release any pages we did get */ if (pret > 0) - put_user_pages(pages, pret); + unpin_user_pages(pages, pret); if (ctx->account_mem) io_unaccount_mem(ctx->user, nr_pages); kvfree(imu->bvec); diff --git a/include/linux/mm.h b/include/linux/mm.h index 253ec2d15f36..9d005381e1b2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1093,18 +1093,18 @@ static inline void put_page(struct page *page) * made against the page. ("gup-pinned" is another term for the latter). * * With this scheme, pin_user_pages() becomes special: such pages are marked as - * distinct from normal pages. As such, the put_user_page() call (and its + * distinct from normal pages. As such, the unpin_user_page() call (and its * variants) must be used in order to release gup-pinned pages. * * Choice of value: * * By making GUP_PIN_COUNTING_BIAS a power of two, debugging of page reference - * counts with respect to pin_user_pages() and put_user_page() becomes simpler, - * due to the fact that adding an even power of two to the page refcount has the - * effect of using only the upper N bits, for the code that counts up using the - * bias value. This means that the lower bits are left for the exclusive use of - * the original code that increments and decrements by one (or at least, by much - * smaller values than the bias value). + * counts with respect to pin_user_pages() and unpin_user_page() becomes + * simpler, due to the fact that adding an even power of two to the page + * refcount has the effect of using only the upper N bits, for the code that + * counts up using the bias value. This means that the lower bits are left for + * the exclusive use of the original code that increments and decrements by one + * (or at least, by much smaller values than the bias value). * * Of course, once the lower bits overflow into the upper bits (and this is * OK, because subtraction recovers the original values), then visual inspection @@ -1119,10 +1119,10 @@ static inline void put_page(struct page *page) */ #define GUP_PIN_COUNTING_BIAS (1UL << 10) -void put_user_page(struct page *page); -void put_user_pages_dirty_lock(struct page **pages, unsigned long npages, - bool make_dirty); -void put_user_pages(struct page **pages, unsigned long npages); +void unpin_user_page(struct page *page); +void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, + bool make_dirty); +void unpin_user_pages(struct page **pages, unsigned long npages); /** * page_dma_pinned() - report if a page is pinned for DMA. @@ -2673,7 +2673,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_ANON 0x8000 /* don't do file mappings */ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ -#define FOLL_PIN 0x40000 /* pages must be released via put_user_page() */ +#define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each @@ -2708,7 +2708,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, * Direct IO). This lets the filesystem know that some non-file-system entity is * potentially changing the pages' data. In contrast to FOLL_GET (whose pages * are released via put_page()), FOLL_PIN pages must be released, ultimately, by - * a call to put_user_page(). + * a call to unpin_user_page(). * * FOLL_PIN is similar to FOLL_GET: both of these pin pages. They use different * and separate refcounting mechanisms, however, and that means that each has @@ -2716,7 +2716,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, * * FOLL_GET: get_user_pages*() to acquire, and put_page() to release. * - * FOLL_PIN: pin_user_pages*() to acquire, and put_user_pages to release. + * FOLL_PIN: pin_user_pages*() to acquire, and unpin_user_pages to release. * * FOLL_PIN and FOLL_GET are mutually exclusive for a given function call. * (The underlying pages may experience both FOLL_GET-based and FOLL_PIN-based @@ -2726,7 +2726,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, * FOLL_PIN should be set internally by the pin_user_pages*() APIs, never * directly by the caller. That's in order to help avoid mismatches when * releasing pages: get_user_pages*() pages must be released via put_page(), - * while pin_user_pages*() pages must be released via put_user_page(). + * while pin_user_pages*() pages must be released via unpin_user_page(). * * Please see Documentation/vm/pin_user_pages.rst for more information. */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 0485cba38d23..d66c1fb9d45e 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -245,7 +245,7 @@ enum node_stat_item { NR_WRITTEN, /* page writings since bootup */ NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */ NR_FOLL_PIN_REQUESTED, /* via: pin_user_page(), gup flag: FOLL_PIN */ - NR_FOLL_PIN_RETURNED, /* pages returned via put_user_page() */ + NR_FOLL_PIN_RETURNED, /* pages returned via unpin_user_page() */ NR_VM_NODE_STAT_ITEMS }; diff --git a/mm/gup.c b/mm/gup.c index b06434185e33..ac5b4c365827 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -113,15 +113,15 @@ static bool __put_devmap_managed_user_page(struct page *page) #endif /* CONFIG_DEV_PAGEMAP_OPS */ /** - * put_user_page() - release a dma-pinned page + * unpin_user_page() - release a dma-pinned page * @page: pointer to page to be released * * Pages that were pinned via pin_user_pages*() must be released via either - * put_user_page(), or one of the put_user_pages*() routines. This is so that - * such pages can be separately tracked and uniquely handled. In particular, - * interactions with RDMA and filesystems need special handling. + * unpin_user_page(), or one of the unpin_user_pages*() routines. This is so + * that such pages can be separately tracked and uniquely handled. In + * particular, interactions with RDMA and filesystems need special handling. */ -void put_user_page(struct page *page) +void unpin_user_page(struct page *page) { page = compound_head(page); @@ -139,10 +139,10 @@ void put_user_page(struct page *page) __update_proc_vmstat(page, NR_FOLL_PIN_RETURNED, 1); } -EXPORT_SYMBOL(put_user_page); +EXPORT_SYMBOL(unpin_user_page); /** - * put_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages + * unpin_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages * @pages: array of pages to be maybe marked dirty, and definitely released. * @npages: number of pages in the @pages array. * @make_dirty: whether to mark the pages dirty @@ -152,19 +152,19 @@ EXPORT_SYMBOL(put_user_page); * * For each page in the @pages array, make that page (or its head page, if a * compound page) dirty, if @make_dirty is true, and if the page was previously - * listed as clean. In any case, releases all pages using put_user_page(), - * possibly via put_user_pages(), for the non-dirty case. + * listed as clean. In any case, releases all pages using unpin_user_page(), + * possibly via unpin_user_pages(), for the non-dirty case. * - * Please see the put_user_page() documentation for details. + * Please see the unpin_user_page() documentation for details. * * set_page_dirty_lock() is used internally. If instead, set_page_dirty() is * required, then the caller should a) verify that this is really correct, * because _lock() is usually required, and b) hand code it: - * set_page_dirty_lock(), put_user_page(). + * set_page_dirty_lock(), unpin_user_page(). * */ -void put_user_pages_dirty_lock(struct page **pages, unsigned long npages, - bool make_dirty) +void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, + bool make_dirty) { unsigned long index; @@ -175,7 +175,7 @@ void put_user_pages_dirty_lock(struct page **pages, unsigned long npages, */ if (!make_dirty) { - put_user_pages(pages, npages); + unpin_user_pages(pages, npages); return; } @@ -203,21 +203,21 @@ void put_user_pages_dirty_lock(struct page **pages, unsigned long npages, */ if (!PageDirty(page)) set_page_dirty_lock(page); - put_user_page(page); + unpin_user_page(page); } } -EXPORT_SYMBOL(put_user_pages_dirty_lock); +EXPORT_SYMBOL(unpin_user_pages_dirty_lock); /** - * put_user_pages() - release an array of gup-pinned pages. + * unpin_user_pages() - release an array of gup-pinned pages. * @pages: array of pages to be marked dirty and released. * @npages: number of pages in the @pages array. * - * For each page in the @pages array, release the page using put_user_page(). + * For each page in the @pages array, release the page using unpin_user_page(). * - * Please see the put_user_page() documentation for details. + * Please see the unpin_user_page() documentation for details. */ -void put_user_pages(struct page **pages, unsigned long npages) +void unpin_user_pages(struct page **pages, unsigned long npages) { unsigned long index; @@ -227,9 +227,9 @@ void put_user_pages(struct page **pages, unsigned long npages) * single operation to the head page should suffice. */ for (index = 0; index < npages; index++) - put_user_page(pages[index]); + unpin_user_page(pages[index]); } -EXPORT_SYMBOL(put_user_pages); +EXPORT_SYMBOL(unpin_user_pages); #ifdef CONFIG_MMU static struct page *no_page_table(struct vm_area_struct *vma, @@ -1956,7 +1956,7 @@ static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start, ClearPageReferenced(page); if (flags & FOLL_PIN) - put_user_page(page); + unpin_user_page(page); else put_page(page); } diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c index 1ac089ad815f..76d32db48af8 100644 --- a/mm/gup_benchmark.c +++ b/mm/gup_benchmark.c @@ -35,7 +35,7 @@ static void put_back_pages(int cmd, struct page **pages, unsigned long nr_pages) case PIN_FAST_BENCHMARK: case PIN_BENCHMARK: - put_user_pages(pages, nr_pages); + unpin_user_pages(pages, nr_pages); break; } } diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c index fd20ab675b85..de41e830cdac 100644 --- a/mm/process_vm_access.c +++ b/mm/process_vm_access.c @@ -126,8 +126,8 @@ static int process_vm_rw_single_vec(unsigned long addr, pa += pinned_pages * PAGE_SIZE; /* If vm_write is set, the pages need to be made dirty: */ - put_user_pages_dirty_lock(process_pages, pinned_pages, - vm_write); + unpin_user_pages_dirty_lock(process_pages, pinned_pages, + vm_write); } return rc; diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index d071003b5e76..ac182c38f7b0 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -212,7 +212,7 @@ static int xdp_umem_map_pages(struct xdp_umem *umem) static void xdp_umem_unpin_pages(struct xdp_umem *umem) { - put_user_pages_dirty_lock(umem->pgs, umem->npgs, true); + unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true); kfree(umem->pgs); umem->pgs = NULL;