From patchwork Fri Jan 31 06:12:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11359163 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED06713A4 for ; Fri, 31 Jan 2020 06:12:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9E9962465B for ; Fri, 31 Jan 2020 06:12:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="qWEKVBO9" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9E9962465B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 794956B04EB; Fri, 31 Jan 2020 01:12:20 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 76A736B04ED; Fri, 31 Jan 2020 01:12:20 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65D3D6B04EE; Fri, 31 Jan 2020 01:12:20 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0234.hostedemail.com [216.40.44.234]) by kanga.kvack.org (Postfix) with ESMTP id 4AECE6B04EB for ; Fri, 31 Jan 2020 01:12:20 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id DADCE181AEF09 for ; Fri, 31 Jan 2020 06:12:19 +0000 (UTC) X-FDA: 76436909598.20.stage56_144a7fd77b60b X-Spam-Summary: 2,0,0,b527a71162597ff7,d41d8cd98f00b204,akpm@linux-foundation.org,:akpm@linux-foundation.org:alex.williamson@redhat.com:aneesh.kumar@linux.ibm.com:axboe@kernel.dk:bjorn.topel@intel.com:corbet@lwn.net:dan.j.williams@intel.com:daniel.vetter@ffwll.ch:hch@lst.de:hverkuil-cisco@xs4all.nl:ira.weiny@intel.com:jack@suse.cz:jgg@mellanox.com:jgg@ziepe.ca:jglisse@redhat.com:jhubbard@nvidia.com:kirill@shutemov.name:leonro@mellanox.com::mchehab@kernel.org:mm-commits@vger.kernel.org:rppt@linux.ibm.com:torvalds@linux-foundation.org,RULES_HIT:1:2:41:69:152:355:379:800:960:967:973:988:989:1260:1263:1277:1311:1313:1314:1345:1359:1381:1431:1437:1513:1515:1516:1518:1521:1593:1594:1605:1730:1747:1777:1792:1801:2198:2199:2393:2525:2553:2559:2566:2682:2685:2691:2693:2731:2859:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3608:3865:3866:3867:3868:3870:3871:3872:3873:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4031:4051:4321:4605:5007:6261:6653: 6737:673 X-HE-Tag: stage56_144a7fd77b60b X-Filterd-Recvd-Size: 10562 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Fri, 31 Jan 2020 06:12:19 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id ECF8220663; Fri, 31 Jan 2020 06:12:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1580451138; bh=21Z9LFu4cT21hQbYgs1ruJ4R6en5TW4l6Q7dqlqOZAo=; h=Date:From:To:Subject:In-Reply-To:From; b=qWEKVBO9+zhICy0q1esqrItWLO2z/7qLQPtLLNucZ3es0R6i8CaKIUOtYuFW0qBfU HilOtBh15utIoQ37LBl5PiTk00fMTyo3UdKH5yGOSxSSh5SkdH8T0EwvoJuGoCgQrU cxw9g2CpJLi2HbiAl38R8PIlYbhwg8ln42I4sT8Y= Date: Thu, 30 Jan 2020 22:12:17 -0800 From: Andrew Morton To: akpm@linux-foundation.org, alex.williamson@redhat.com, aneesh.kumar@linux.ibm.com, axboe@kernel.dk, bjorn.topel@intel.com, corbet@lwn.net, dan.j.williams@intel.com, daniel.vetter@ffwll.ch, hch@lst.de, hverkuil-cisco@xs4all.nl, ira.weiny@intel.com, jack@suse.cz, jgg@mellanox.com, jgg@ziepe.ca, jglisse@redhat.com, jhubbard@nvidia.com, kirill@shutemov.name, leonro@mellanox.com, linux-mm@kvack.org, mchehab@kernel.org, mm-commits@vger.kernel.org, rppt@linux.ibm.com, torvalds@linux-foundation.org Subject: [patch 024/118] mm/gup: factor out duplicate code from four routines Message-ID: <20200131061217.fTNI3jtFV%akpm@linux-foundation.org> In-Reply-To: <20200130221021.5f0211c56346d5485af07923@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: John Hubbard Subject: mm/gup: factor out duplicate code from four routines Patch series "mm/gup: prereqs to track dma-pinned pages: FOLL_PIN", v12. Overview: This is a prerequisite to solving the problem of proper interactions between file-backed pages, and [R]DMA activities, as discussed in [1], [2], [3], and in a remarkable number of email threads since about 2017. :) A new internal gup flag, FOLL_PIN is introduced, and thoroughly documented in the last patch's Documentation/vm/pin_user_pages.rst. I believe that this will provide a good starting point for doing the layout lease work that Ira Weiny has been working on. That's because these new wrapper functions provide a clean, constrained, systematically named set of functionality that, again, is required in order to even know if a page is "dma-pinned". In contrast to earlier approaches, the page tracking can be incrementally applied to the kernel call sites that, until now, have been simply calling get_user_pages() ("gup"). In other words, opt-in by changing from this: get_user_pages() (sets FOLL_GET) put_page() to this: pin_user_pages() (sets FOLL_PIN) unpin_user_page() Testing: * I've done some overall kernel testing (LTP, and a few other goodies), and some directed testing to exercise some of the changes. And as you can see, gup_benchmark is enhanced to exercise this. Basically, I've been able to runtime test the core get_user_pages() and pin_user_pages() and related routines, but not so much on several of the call sites--but those are generally just a couple of lines changed, each. Not much of the kernel is actually using this, which on one hand reduces risk quite a lot. But on the other hand, testing coverage is low. So I'd love it if, in particular, the Infiniband and PowerPC folks could do a smoke test of this series for me. Runtime testing for the call sites so far is pretty light: * io_uring: Some directed tests from liburing exercise this, and they pass. * process_vm_access.c: A small directed test passes. * gup_benchmark: the enhanced version hits the new gup.c code, and passes. * infiniband: Ran rdma-core tests: rdma-core/build/bin/run_tests.py * VFIO: compiles (I'm vowing to set up a run time test soon, but it's not ready just yet) * powerpc: it compiles... * drm/via: compiles... * goldfish: compiles... * net/xdp: compiles... * media/v4l2: compiles... [1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/ [2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/ [3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/ This patch (of 22): There are four locations in gup.c that have a fair amount of code duplication. This means that changing one requires making the same changes in four places, not to mention reading the same code four times, and wondering if there are subtle differences. Factor out the common code into static functions, thus reducing the overall line count and the code's complexity. Also, take the opportunity to slightly improve the efficiency of the error cases, by doing a mass subtraction of the refcount, surrounded by get_page()/put_page(). Also, further simplify (slightly), by waiting until the the successful end of each routine, to increment *nr. Link: http://lkml.kernel.org/r/20200107224558.2362728-2-jhubbard@nvidia.com Signed-off-by: John Hubbard Reviewed-by: Christoph Hellwig Reviewed-by: Jérôme Glisse Reviewed-by: Jan Kara Cc: Kirill A. Shutemov Cc: Ira Weiny Cc: Christoph Hellwig Cc: Aneesh Kumar K.V Cc: Alex Williamson Cc: Björn Töpel Cc: Daniel Vetter Cc: Dan Williams Cc: Hans Verkuil Cc: Jason Gunthorpe Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Jonathan Corbet Cc: Leon Romanovsky Cc: Mauro Carvalho Chehab Cc: Mike Rapoport Signed-off-by: Andrew Morton --- mm/gup.c | 95 ++++++++++++++++++++++------------------------------- 1 file changed, 40 insertions(+), 55 deletions(-) --- a/mm/gup.c~mm-gup-factor-out-duplicate-code-from-four-routines +++ a/mm/gup.c @@ -1978,6 +1978,29 @@ static int __gup_device_huge_pud(pud_t p } #endif +static int record_subpages(struct page *page, unsigned long addr, + unsigned long end, struct page **pages) +{ + int nr; + + for (nr = 0; addr != end; addr += PAGE_SIZE) + pages[nr++] = page++; + + return nr; +} + +static void put_compound_head(struct page *page, int refs) +{ + VM_BUG_ON_PAGE(page_ref_count(page) < refs, page); + /* + * Calling put_page() for each ref is unnecessarily slow. Only the last + * ref needs a put_page(). + */ + if (refs > 1) + page_ref_sub(page, refs - 1); + put_page(page); +} + #ifdef CONFIG_ARCH_HAS_HUGEPD static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end, unsigned long sz) @@ -2007,32 +2030,20 @@ static int gup_hugepte(pte_t *ptep, unsi /* hugepages are never "special" */ VM_BUG_ON(!pfn_valid(pte_pfn(pte))); - refs = 0; head = pte_page(pte); - page = head + ((addr & (sz-1)) >> PAGE_SHIFT); - do { - VM_BUG_ON(compound_head(page) != head); - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); + refs = record_subpages(page, addr, end, pages + *nr); head = try_get_compound_head(head, refs); - if (!head) { - *nr -= refs; + if (!head) return 0; - } if (unlikely(pte_val(pte) != pte_val(*ptep))) { - /* Could be optimized better */ - *nr -= refs; - while (refs--) - put_page(head); + put_compound_head(head, refs); return 0; } + *nr += refs; SetPageReferenced(head); return 1; } @@ -2079,28 +2090,19 @@ static int gup_huge_pmd(pmd_t orig, pmd_ return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr); } - refs = 0; page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - do { - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); + refs = record_subpages(page, addr, end, pages + *nr); head = try_get_compound_head(pmd_page(orig), refs); - if (!head) { - *nr -= refs; + if (!head) return 0; - } if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { - *nr -= refs; - while (refs--) - put_page(head); + put_compound_head(head, refs); return 0; } + *nr += refs; SetPageReferenced(head); return 1; } @@ -2120,28 +2122,19 @@ static int gup_huge_pud(pud_t orig, pud_ return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr); } - refs = 0; page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); - do { - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); + refs = record_subpages(page, addr, end, pages + *nr); head = try_get_compound_head(pud_page(orig), refs); - if (!head) { - *nr -= refs; + if (!head) return 0; - } if (unlikely(pud_val(orig) != pud_val(*pudp))) { - *nr -= refs; - while (refs--) - put_page(head); + put_compound_head(head, refs); return 0; } + *nr += refs; SetPageReferenced(head); return 1; } @@ -2157,28 +2150,20 @@ static int gup_huge_pgd(pgd_t orig, pgd_ return 0; BUILD_BUG_ON(pgd_devmap(orig)); - refs = 0; + page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT); - do { - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); + refs = record_subpages(page, addr, end, pages + *nr); head = try_get_compound_head(pgd_page(orig), refs); - if (!head) { - *nr -= refs; + if (!head) return 0; - } if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) { - *nr -= refs; - while (refs--) - put_page(head); + put_compound_head(head, refs); return 0; } + *nr += refs; SetPageReferenced(head); return 1; }