From patchwork Fri Oct 12 06:00:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: john.hubbard@gmail.com X-Patchwork-Id: 10637931 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8028915E2 for ; Fri, 12 Oct 2018 06:01:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 718562BB24 for ; Fri, 12 Oct 2018 06:01:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 65A562BFF5; Fri, 12 Oct 2018 06:01:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ECF652BB24 for ; Fri, 12 Oct 2018 06:01:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727722AbeJLNbT (ORCPT ); Fri, 12 Oct 2018 09:31:19 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:34784 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727056AbeJLNbS (ORCPT ); Fri, 12 Oct 2018 09:31:18 -0400 Received: by mail-pg1-f193.google.com with SMTP id g12-v6so5316880pgs.1; Thu, 11 Oct 2018 23:00:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1XCFIhhXu5KYwGhfpBRnNL+5p8VFETRcpPOZd4dSpDA=; b=fsDeDIPQmKRnI18r/+2FwFVtak9RtcpCRBUnKBYyScKv96iejPuwkAQK5ZDewjXoiS 1qRLlcBoDxq0ldinIPZe2PHajgJ2L0p1vsgKHiTEDhF/ZWNZfL6Yo4btB3G9pYx59v24 7nAHOzUvqbXghHcTNGfOI/eXy88UQV+CFdHOoD2EC9t1CkIBJ70qB/nSZJCGToEXoaQb 9EW+unarM6zGw5G9uHFgfhCnasiHFN4JCl/Gf7HQvce5U+YHL3VFrAk3uYr1k1v5e+CX JyLU88noyz2M50jztrqUyJLuWn/MWfb2V2TDjeakz+eDcn+pdPeKPN38/tDlBu4Hloy7 Kp7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1XCFIhhXu5KYwGhfpBRnNL+5p8VFETRcpPOZd4dSpDA=; b=kPHhqBoHENK/W5smzg6KNtJ6+yeLadiChHQ2ny4IASwNQjsyZZAPfr/3rLaKVzpzV4 IQxzSnVt5P/aBPR+VM8ZcmDH6rqlyd5pLk+cSKm2KwF5hzxh8xDV6aL7zn1LMx1wop1R yB6GL9lhJ7fYe54Q2Nsc+e+O6jTPz7t/CMJ8IxzQrMNDEj9G0CtYHuiD5ZAA+kU8gWeR HtjTJpYTusymblbke5yPJW9hmFartm7Iw0FHh2I8wGjJ/6R63TlbMQC+12E8Jv8E4TPq MhxcXAZJQzniD3gxfdUm1t4/VkBOW0cdLE5khV2A1KhKzjEALwjIEnb5SSD+Y1tsmL11 r33Q== X-Gm-Message-State: ABuFfohZ7M6WXlU/TdcRqO82QZbs+hH5QUst1Pi9kSAUAh8WhDXGhHv1 27SPiXgNhtLNnURkoRiXMNQ= X-Google-Smtp-Source: ACcGV62XNUiEKULSVjMNvWsdYNkkuO1xneOLDXlNTd0PHbasw7NkAHpQ0s2S8aBqXpoWs/95W2x9bA== X-Received: by 2002:a62:70c7:: with SMTP id l190-v6mr4652059pfc.186.1539324029162; Thu, 11 Oct 2018 23:00:29 -0700 (PDT) Received: from blueforge.nvidia.com (searspoint.nvidia.com. [216.228.112.21]) by smtp.gmail.com with ESMTPSA id z3-v6sm368579pfm.150.2018.10.11.23.00.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Oct 2018 23:00:28 -0700 (PDT) From: john.hubbard@gmail.com X-Google-Original-From: jhubbard@nvidia.com To: Matthew Wilcox , Michal Hocko , Christopher Lameter , Jason Gunthorpe , Dan Williams , Jan Kara Cc: linux-mm@kvack.org, Andrew Morton , LKML , linux-rdma , linux-fsdevel@vger.kernel.org, John Hubbard Subject: [PATCH 1/6] mm: get_user_pages: consolidate error handling Date: Thu, 11 Oct 2018 23:00:09 -0700 Message-Id: <20181012060014.10242-2-jhubbard@nvidia.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181012060014.10242-1-jhubbard@nvidia.com> References: <20181012060014.10242-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: John Hubbard An upcoming patch requires a way to operate on each page that any of the get_user_pages_*() variants returns. In preparation for that, consolidate the error handling for __get_user_pages(). This provides a single location (the "out:" label) for operating on the collected set of pages that are about to be returned. As long every use of the "ret" variable is being edited, rename "ret" --> "err", so that its name matches its true role. This also gets rid of two shadowed variable declarations, as a tiny beneficial a side effect. Reviewed-by: Jan Kara Reviewed-by: Andrew Morton Signed-off-by: John Hubbard Reviewed-by: Balbir Singh --- mm/gup.c | 37 ++++++++++++++++++++++--------------- 1 file changed, 22 insertions(+), 15 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 1abc8b4afff6..05ee7c18e59a 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -660,6 +660,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, struct vm_area_struct **vmas, int *nonblocking) { long i = 0; + int err = 0; unsigned int page_mask; struct vm_area_struct *vma = NULL; @@ -685,18 +686,19 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, if (!vma || start >= vma->vm_end) { vma = find_extend_vma(mm, start); if (!vma && in_gate_area(mm, start)) { - int ret; - ret = get_gate_page(mm, start & PAGE_MASK, + err = get_gate_page(mm, start & PAGE_MASK, gup_flags, &vma, pages ? &pages[i] : NULL); - if (ret) - return i ? : ret; + if (err) + goto out; page_mask = 0; goto next_page; } - if (!vma || check_vma_flags(vma, gup_flags)) - return i ? : -EFAULT; + if (!vma || check_vma_flags(vma, gup_flags)) { + err = -EFAULT; + goto out; + } if (is_vm_hugetlb_page(vma)) { i = follow_hugetlb_page(mm, vma, pages, vmas, &start, &nr_pages, i, @@ -709,23 +711,25 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, * If we have a pending SIGKILL, don't keep faulting pages and * potentially allocating memory. */ - if (unlikely(fatal_signal_pending(current))) - return i ? i : -ERESTARTSYS; + if (unlikely(fatal_signal_pending(current))) { + err = -ERESTARTSYS; + goto out; + } cond_resched(); page = follow_page_mask(vma, start, foll_flags, &page_mask); if (!page) { - int ret; - ret = faultin_page(tsk, vma, start, &foll_flags, + err = faultin_page(tsk, vma, start, &foll_flags, nonblocking); - switch (ret) { + switch (err) { case 0: goto retry; case -EFAULT: case -ENOMEM: case -EHWPOISON: - return i ? i : ret; + goto out; case -EBUSY: - return i; + err = 0; + goto out; case -ENOENT: goto next_page; } @@ -737,7 +741,8 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, */ goto next_page; } else if (IS_ERR(page)) { - return i ? i : PTR_ERR(page); + err = PTR_ERR(page); + goto out; } if (pages) { pages[i] = page; @@ -757,7 +762,9 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, start += page_increm * PAGE_SIZE; nr_pages -= page_increm; } while (nr_pages); - return i; + +out: + return i ? i : err; } static bool vma_permits_fault(struct vm_area_struct *vma, From patchwork Fri Oct 12 06:00:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: john.hubbard@gmail.com X-Patchwork-Id: 10637929 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CCCFA17E1 for ; Fri, 12 Oct 2018 06:01:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BDFB12BFEF for ; Fri, 12 Oct 2018 06:01:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B1CC42BFF6; Fri, 12 Oct 2018 06:01:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25A0B2BFEF for ; Fri, 12 Oct 2018 06:01:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727765AbeJLNbU (ORCPT ); Fri, 12 Oct 2018 09:31:20 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:46789 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727723AbeJLNbU (ORCPT ); Fri, 12 Oct 2018 09:31:20 -0400 Received: by mail-pg1-f194.google.com with SMTP id a5-v6so5293907pgv.13; Thu, 11 Oct 2018 23:00:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SRLJEVzK4rsJUyTaYZycnmpQ7/ZKK3lTkLVUuw5INZ4=; b=AxjBpuU766mXdfyjN2OWatg/Liyw5N/oXhRLqPL+Nea76E0Yzy1MgfO84O4EzWeeiP oi22kXoWmdGZxPlEnnYij0mmjuRufJnkPRzR4KYBSY4zzXEoRr4uxr7vNZtlgYnCNLjN BVvEdojqkPAs49gvT8Dv1RqdoHGHgslQvI/XZ0jVWoAJh1gfcGx8LDaE914XLoSuRZ+p YkqooFuTP7VvwBfNuI/qxdmG84sQMrUST3zCyiqPe1VVSSPe89mOCTq4hTCXnjpXFe16 b81Xk2Cy1FyCaDZDKs2l02kxmjtwUVSIdNzaschO1LzG6skPNbCMA/40ETOV6/2ejZyK SWWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SRLJEVzK4rsJUyTaYZycnmpQ7/ZKK3lTkLVUuw5INZ4=; b=nNbK9hFoqQIZujrLCM8KYoL8H95TKIkZiTPfZEFPulA/I2+4sFEjafnNsZgVbw7zA1 8oShlBzUp+9BU3+A3St8mzA1XbZuQeD6OklRKmvznqmSx0Ui0m4LZ5bIs+xRiYoIkNE/ F3ldkvI/dbWmNl7c74o+h/d8vrt/n5zHttRcsK4tj89QUEKlHsbGpBLJer9Y03nwLywn xSuH2+9+RG+KsgCMl6xHec6vzG9GoWIWZmE+X9QhyXNXbcXd3yyGh3NjqbF5LXKMzJwh uCCrhz8f5B7FcKsbxFay1NpRquTJQxeOGyhCRKGAhP2SwKu8zRqbH+rlZmobE4m9F+NJ GbuA== X-Gm-Message-State: ABuFfohecsXqvhb+Jz/nXZf9x07N+4geBDThNX315SNMmFPnfsgSeEfs 9uTdXGRZ81fp6JNS1gj24/A= X-Google-Smtp-Source: ACcGV62EXQmQtfK5+BRwRVu50/Z/i04Ym+nAuFReKL1rL6Ze0ikGwYEOKoKFATe2a/sSVbyyXmsnlw== X-Received: by 2002:a63:6a86:: with SMTP id f128-v6mr4298418pgc.165.1539324030884; Thu, 11 Oct 2018 23:00:30 -0700 (PDT) Received: from blueforge.nvidia.com (searspoint.nvidia.com. [216.228.112.21]) by smtp.gmail.com with ESMTPSA id z3-v6sm368579pfm.150.2018.10.11.23.00.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Oct 2018 23:00:29 -0700 (PDT) From: john.hubbard@gmail.com X-Google-Original-From: jhubbard@nvidia.com To: Matthew Wilcox , Michal Hocko , Christopher Lameter , Jason Gunthorpe , Dan Williams , Jan Kara Cc: linux-mm@kvack.org, Andrew Morton , LKML , linux-rdma , linux-fsdevel@vger.kernel.org, John Hubbard , Al Viro , Jerome Glisse , Christoph Hellwig , Ralph Campbell Subject: [PATCH 2/6] mm: introduce put_user_page*(), placeholder versions Date: Thu, 11 Oct 2018 23:00:10 -0700 Message-Id: <20181012060014.10242-3-jhubbard@nvidia.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181012060014.10242-1-jhubbard@nvidia.com> References: <20181012060014.10242-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: John Hubbard Introduces put_user_page(), which simply calls put_page(). This provides a way to update all get_user_pages*() callers, so that they call put_user_page(), instead of put_page(). Also introduces put_user_pages(), and a few dirty/locked variations, as a replacement for release_pages(), and also as a replacement for open-coded loops that release multiple pages. These may be used for subsequent performance improvements, via batching of pages to be released. This is the first step of fixing the problem described in [1]. The steps are: 1) (This patch): provide put_user_page*() routines, intended to be used for releasing pages that were pinned via get_user_pages*(). 2) Convert all of the call sites for get_user_pages*(), to invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, any will take some time. 3) After (2) is complete, use get_user_pages*() and put_user_page*() to implement tracking of these pages. This tracking will be separate from the existing struct page refcounting. 4) Use the tracking and identification of these pages, to implement special handling (especially in writeback paths) when the pages are backed by a filesystem. Again, [1] provides details as to why that is desirable. [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" CC: Matthew Wilcox CC: Michal Hocko CC: Christopher Lameter CC: Jason Gunthorpe CC: Dan Williams CC: Jan Kara CC: Al Viro CC: Jerome Glisse CC: Christoph Hellwig CC: Ralph Campbell Reviewed-by: Jan Kara Signed-off-by: John Hubbard --- include/linux/mm.h | 20 +++++++++++ mm/swap.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 103 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0416a7204be3..76d18aada9f8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -943,6 +943,26 @@ static inline void put_page(struct page *page) __put_page(page); } +/* + * put_user_page() - release a page that had previously been acquired via + * a call to one of the get_user_pages*() functions. + * + * Pages that were pinned via get_user_pages*() must be released via + * either put_user_page(), or one of the put_user_pages*() routines + * below. This is so that eventually, pages that are pinned via + * get_user_pages*() can be separately tracked and uniquely handled. In + * particular, interactions with RDMA and filesystems need special + * handling. + */ +static inline void put_user_page(struct page *page) +{ + put_page(page); +} + +void put_user_pages_dirty(struct page **pages, unsigned long npages); +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages); +void put_user_pages(struct page **pages, unsigned long npages); + #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define SECTION_IN_PAGE_FLAGS #endif diff --git a/mm/swap.c b/mm/swap.c index 26fc9b5f1b6c..efab3a6b6f91 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -134,6 +134,89 @@ void put_pages_list(struct list_head *pages) } EXPORT_SYMBOL(put_pages_list); +/* + * put_user_pages_dirty() - for each page in the @pages array, make + * that page (or its head page, if a compound page) dirty, if it was + * previously listed as clean. Then, release the page using + * put_user_page(). + * + * Please see the put_user_page() documentation for details. + * + * set_page_dirty(), which does not lock the page, is used here. + * Therefore, it is the caller's responsibility to ensure that this is + * safe. If not, then put_user_pages_dirty_lock() should be called instead. + * + * @pages: array of pages to be marked dirty and released. + * @npages: number of pages in the @pages array. + * + */ +void put_user_pages_dirty(struct page **pages, unsigned long npages) +{ + unsigned long index; + + for (index = 0; index < npages; index++) { + struct page *page = compound_head(pages[index]); + + if (!PageDirty(page)) + set_page_dirty(page); + + put_user_page(page); + } +} +EXPORT_SYMBOL(put_user_pages_dirty); + +/* + * put_user_pages_dirty_lock() - for each page in the @pages array, make + * that page (or its head page, if a compound page) dirty, if it was + * previously listed as clean. Then, release the page using + * put_user_page(). + * + * Please see the put_user_page() documentation for details. + * + * This is just like put_user_pages_dirty(), except that it invokes + * set_page_dirty_lock(), instead of set_page_dirty(). + * + * @pages: array of pages to be marked dirty and released. + * @npages: number of pages in the @pages array. + * + */ +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages) +{ + unsigned long index; + + for (index = 0; index < npages; index++) { + struct page *page = compound_head(pages[index]); + + if (!PageDirty(page)) + set_page_dirty_lock(page); + + put_user_page(page); + } +} +EXPORT_SYMBOL(put_user_pages_dirty_lock); + +/* + * put_user_pages() - for each page in the @pages array, release the page + * using put_user_page(). + * + * Please see the put_user_page() documentation for details. + * + * This is just like put_user_pages_dirty(), except that it invokes + * set_page_dirty_lock(), instead of set_page_dirty(). + * + * @pages: array of pages to be marked dirty and released. + * @npages: number of pages in the @pages array. + * + */ +void put_user_pages(struct page **pages, unsigned long npages) +{ + unsigned long index; + + for (index = 0; index < npages; index++) + put_user_page(pages[index]); +} +EXPORT_SYMBOL(put_user_pages); + /* * get_kernel_pages() - pin kernel pages in memory * @kiov: An array of struct kvec structures From patchwork Fri Oct 12 06:00:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: john.hubbard@gmail.com X-Patchwork-Id: 10637925 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C784917E1 for ; Fri, 12 Oct 2018 06:01:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B8D102BFEF for ; Fri, 12 Oct 2018 06:01:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AC2F72C007; Fri, 12 Oct 2018 06:01:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 121562BFEF for ; Fri, 12 Oct 2018 06:01:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727791AbeJLNbW (ORCPT ); Fri, 12 Oct 2018 09:31:22 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:38871 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727723AbeJLNbV (ORCPT ); Fri, 12 Oct 2018 09:31:21 -0400 Received: by mail-pl1-f193.google.com with SMTP id q19-v6so2291692pll.5; Thu, 11 Oct 2018 23:00:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=TbmsHWe5H8ma6Tv0sw8/rQINZ8rQuKk0IUgMEK2LUDs=; b=SOza6iMcDZweCeIJbKklDIWZgw6aAu/UeqYhNXHz/kDMlj3WbqJI2YzEuJKE/6AD+u dhOP1iuotzf+xyeq7GyJHwSKzMpvEjVNDPA+lh1Tgthnl0xjMcDynX3rwRKpfr3P8KzN f6vHThI5hylU5PQh3L4msPX11IcIIcWpzH117Dkqi1pMSMIR8pPhma3rYb3IdvbWl96O X+hu2QVOy094OS/Td/otRmg0dIQDmX/UD5cAxQWeKlBH34uek7+oedk4VXffsto48P49 iZGI95IJOrz5sp6T2LLGj3xaNnaR/KnkuWJMz+nvhy2LXQr+gs8XsclIDgJ7C1FZkQxm 7MvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TbmsHWe5H8ma6Tv0sw8/rQINZ8rQuKk0IUgMEK2LUDs=; b=XG0AsV7++Ji5m/Q8LwLrYzsd9b++1hjlmQsIQAXqpCi5V5ZrqvY8o3teyXKimRvem0 cOyt1MGRJ5xcnaizvgPv0FM6XJAb3Pg8QyhlhJ2O1ky9MB4CtL/6a2f3ra/AfhOn4XQT fXVuDJq1N4EJe1UQrZhyj4OwcXn7nNdHYHzPvSVX6st7TIHZ1gmUDLg6AI4CH04r/tww cLkPk8t82lagenu1zQqP1/fGGMz0I3U68qO5HN/sVEb2HL/6ug5YMSkKHnd4ov8p+o9k TAGKqYqJh4Iuw0qrrfBpZfGoShyHnjjyYA8goBjW8PgR/4Kg89eOUQ+zF1l3uw8xXjDr m75w== X-Gm-Message-State: ABuFfojebBQLm8eTjiYgjatj2O5CLf4A9+DaBh+pG+dLdkktgVpcX3H0 vsd4tZVZo7hA2Arp1PDkgow= X-Google-Smtp-Source: ACcGV61k0cvb+XNFaS8DT41ZtkUARA3O2JNqC8u1QerZAfj4wNRF1eC2mhnU6vwHdqdMlgq3LLB+Jw== X-Received: by 2002:a17:902:82cc:: with SMTP id u12-v6mr4413380plz.146.1539324032675; Thu, 11 Oct 2018 23:00:32 -0700 (PDT) Received: from blueforge.nvidia.com (searspoint.nvidia.com. [216.228.112.21]) by smtp.gmail.com with ESMTPSA id z3-v6sm368579pfm.150.2018.10.11.23.00.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Oct 2018 23:00:31 -0700 (PDT) From: john.hubbard@gmail.com X-Google-Original-From: jhubbard@nvidia.com To: Matthew Wilcox , Michal Hocko , Christopher Lameter , Jason Gunthorpe , Dan Williams , Jan Kara Cc: linux-mm@kvack.org, Andrew Morton , LKML , linux-rdma , linux-fsdevel@vger.kernel.org, John Hubbard , Doug Ledford , Mike Marciniszyn , Dennis Dalessandro , Christian Benvenuti Subject: [PATCH 3/6] infiniband/mm: convert put_page() to put_user_page*() Date: Thu, 11 Oct 2018 23:00:11 -0700 Message-Id: <20181012060014.10242-4-jhubbard@nvidia.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181012060014.10242-1-jhubbard@nvidia.com> References: <20181012060014.10242-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: John Hubbard For infiniband code that retains pages via get_user_pages*(), release those pages via the new put_user_page(), or put_user_pages*(), instead of put_page() This is a tiny part of the second step of fixing the problem described in [1]. The steps are: 1) Provide put_user_page*() routines, intended to be used for releasing pages that were pinned via get_user_pages*(). 2) Convert all of the call sites for get_user_pages*(), to invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, any will take some time. 3) After (2) is complete, use get_user_pages*() and put_user_page*() to implement tracking of these pages. This tracking will be separate from the existing struct page refcounting. 4) Use the tracking and identification of these pages, to implement special handling (especially in writeback paths) when the pages are backed by a filesystem. Again, [1] provides details as to why that is desirable. [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" CC: Doug Ledford CC: Jason Gunthorpe CC: Mike Marciniszyn CC: Dennis Dalessandro CC: Christian Benvenuti CC: linux-rdma@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-mm@kvack.org Reviewed-by: Jan Kara Reviewed-by: Dennis Dalessandro Acked-by: Jason Gunthorpe Signed-off-by: John Hubbard --- drivers/infiniband/core/umem.c | 7 ++++--- drivers/infiniband/core/umem_odp.c | 2 +- drivers/infiniband/hw/hfi1/user_pages.c | 11 ++++------- drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +++--- drivers/infiniband/hw/qib/qib_user_pages.c | 11 ++++------- drivers/infiniband/hw/qib/qib_user_sdma.c | 6 +++--- drivers/infiniband/hw/usnic/usnic_uiom.c | 7 ++++--- 7 files changed, 23 insertions(+), 27 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index a41792dbae1f..7ab7a3a35eb4 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -58,9 +58,10 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) { page = sg_page(sg); - if (!PageDirty(page) && umem->writable && dirty) - set_page_dirty_lock(page); - put_page(page); + if (umem->writable && dirty) + put_user_pages_dirty_lock(&page, 1); + else + put_user_page(page); } sg_free_table(&umem->sg_head); diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index 6ec748eccff7..6227b89cf05c 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -717,7 +717,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt, ret = -EFAULT; break; } - put_page(local_page_list[j]); + put_user_page(local_page_list[j]); continue; } diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c index e341e6dcc388..99ccc0483711 100644 --- a/drivers/infiniband/hw/hfi1/user_pages.c +++ b/drivers/infiniband/hw/hfi1/user_pages.c @@ -121,13 +121,10 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np void hfi1_release_user_pages(struct mm_struct *mm, struct page **p, size_t npages, bool dirty) { - size_t i; - - for (i = 0; i < npages; i++) { - if (dirty) - set_page_dirty_lock(p[i]); - put_page(p[i]); - } + if (dirty) + put_user_pages_dirty_lock(p, npages); + else + put_user_pages(p, npages); if (mm) { /* during close after signal, mm can be NULL */ down_write(&mm->mmap_sem); diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c index cc9c0c8ccba3..b8b12effd009 100644 --- a/drivers/infiniband/hw/mthca/mthca_memfree.c +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c @@ -481,7 +481,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar, ret = pci_map_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE); if (ret < 0) { - put_page(pages[0]); + put_user_page(pages[0]); goto out; } @@ -489,7 +489,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar, mthca_uarc_virt(dev, uar, i)); if (ret) { pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE); - put_page(sg_page(&db_tab->page[i].mem)); + put_user_page(sg_page(&db_tab->page[i].mem)); goto out; } @@ -555,7 +555,7 @@ void mthca_cleanup_user_db_tab(struct mthca_dev *dev, struct mthca_uar *uar, if (db_tab->page[i].uvirt) { mthca_UNMAP_ICM(dev, mthca_uarc_virt(dev, uar, i), 1); pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE); - put_page(sg_page(&db_tab->page[i].mem)); + put_user_page(sg_page(&db_tab->page[i].mem)); } } diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c index 16543d5e80c3..1a5c64c8695f 100644 --- a/drivers/infiniband/hw/qib/qib_user_pages.c +++ b/drivers/infiniband/hw/qib/qib_user_pages.c @@ -40,13 +40,10 @@ static void __qib_release_user_pages(struct page **p, size_t num_pages, int dirty) { - size_t i; - - for (i = 0; i < num_pages; i++) { - if (dirty) - set_page_dirty_lock(p[i]); - put_page(p[i]); - } + if (dirty) + put_user_pages_dirty_lock(p, num_pages); + else + put_user_pages(p, num_pages); } /* diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c b/drivers/infiniband/hw/qib/qib_user_sdma.c index 926f3c8eba69..4a4b802b011f 100644 --- a/drivers/infiniband/hw/qib/qib_user_sdma.c +++ b/drivers/infiniband/hw/qib/qib_user_sdma.c @@ -321,7 +321,7 @@ static int qib_user_sdma_page_to_frags(const struct qib_devdata *dd, * the caller can ignore this page. */ if (put) { - put_page(page); + put_user_page(page); } else { /* coalesce case */ kunmap(page); @@ -635,7 +635,7 @@ static void qib_user_sdma_free_pkt_frag(struct device *dev, kunmap(pkt->addr[i].page); if (pkt->addr[i].put_page) - put_page(pkt->addr[i].page); + put_user_page(pkt->addr[i].page); else __free_page(pkt->addr[i].page); } else if (pkt->addr[i].kvaddr) { @@ -710,7 +710,7 @@ static int qib_user_sdma_pin_pages(const struct qib_devdata *dd, /* if error, return all pages not managed by pkt */ free_pages: while (i < j) - put_page(pages[i++]); + put_user_page(pages[i++]); done: return ret; diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index 9dd39daa602b..9e3615fd05f7 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -89,9 +89,10 @@ static void usnic_uiom_put_pages(struct list_head *chunk_list, int dirty) for_each_sg(chunk->page_list, sg, chunk->nents, i) { page = sg_page(sg); pa = sg_phys(sg); - if (!PageDirty(page) && dirty) - set_page_dirty_lock(page); - put_page(page); + if (dirty) + put_user_pages_dirty_lock(&page, 1); + else + put_user_page(page); usnic_dbg("pa: %pa\n", &pa); } kfree(chunk); From patchwork Fri Oct 12 06:00:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: john.hubbard@gmail.com X-Patchwork-Id: 10637921 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A53D215E2 for ; Fri, 12 Oct 2018 06:00:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 96B3B2BB24 for ; Fri, 12 Oct 2018 06:00:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8A5B32BFF6; Fri, 12 Oct 2018 06:00:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 82E452BB24 for ; Fri, 12 Oct 2018 06:00:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727827AbeJLNbY (ORCPT ); Fri, 12 Oct 2018 09:31:24 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:35320 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727797AbeJLNbX (ORCPT ); Fri, 12 Oct 2018 09:31:23 -0400 Received: by mail-pg1-f193.google.com with SMTP id v133-v6so5313681pgb.2; Thu, 11 Oct 2018 23:00:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=gh0iMcWny3zvSFj5yeBKMay2lp+g5gk3KODm/Qj7ITs=; b=HNk43ufbmLiWrbq934PEnQKa5pQl5yxHoNNHbyzBh0vlNOW07XMAlxaq1ELX6DZpZr zd82cVQlG4M5ahqLMAw8uA0fFC94QAAQM2/1VkEfMl+5eP78RN30vgmBijluo92lfni6 TylQ5wCZrHCTE31lOVMH2ID6khhot8SECkwpNf+u0zyxNs9tqPSF6KJor6D0CzD5N5pe 1J8p+N1a6T5DHEZvGRQYFGySAcBFQ0o72NZrrVaVJxiUToixfYkA9DWzc3xqXTB2pFu9 Pm9nBJTr+/43rjK8D7RZo9NheAydcKS0DPFltpC3/hCdJFCPTZzMP88AoCLHvukmCQ6u xNqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=gh0iMcWny3zvSFj5yeBKMay2lp+g5gk3KODm/Qj7ITs=; b=szhdQO+sRh/r1vEUUazrv+6F0Ly4SKFh/s7yy7y0zsF+LqkYHaRwwUn0xQdDDLZcvr CKSvId69PHqLyzIK1Xgt94oeR0vuC5e75TyNdHElzzYgR8r3yeHoBX8tFJcEVpmYxed5 n8Qy0k8Dd+ZwkQ+64nSXm0VZTV4PEnCsL+4XBDTni8KrwZKfdI1K5UhRjhWM68gazogj 33KKAOYcMW5alethWHiVTkKGuqs91AIxsSok4au2V5GdAyH0UKVIbJlmtirO/yruEcdn /ni2DnICTMkz/tM6ku8f/YMJ2a0Gu+lqfUuvHl4LUH5FXGzO6a42uVwwo4eo0bSRb2t9 11MA== X-Gm-Message-State: ABuFfoi24DzntqAsiyPG46B+Y+dbFYHMs0txAYQ+WWhoWgpYXpAohv0k 3mpdtsH3y2xx1S5Jh+7Etks= X-Google-Smtp-Source: ACcGV63gnU4EL5vvLLUMKJKEYNEP1I2/rD44xKSgl6Vt2wcq0ufr4HzfigzcCAy0DNUZXlQGQoCnrQ== X-Received: by 2002:a63:c508:: with SMTP id f8-v6mr4290568pgd.412.1539324034160; Thu, 11 Oct 2018 23:00:34 -0700 (PDT) Received: from blueforge.nvidia.com (searspoint.nvidia.com. [216.228.112.21]) by smtp.gmail.com with ESMTPSA id z3-v6sm368579pfm.150.2018.10.11.23.00.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Oct 2018 23:00:33 -0700 (PDT) From: john.hubbard@gmail.com X-Google-Original-From: jhubbard@nvidia.com To: Matthew Wilcox , Michal Hocko , Christopher Lameter , Jason Gunthorpe , Dan Williams , Jan Kara Cc: linux-mm@kvack.org, Andrew Morton , LKML , linux-rdma , linux-fsdevel@vger.kernel.org, John Hubbard Subject: [PATCH 4/6] mm: introduce page->dma_pinned_flags, _count Date: Thu, 11 Oct 2018 23:00:12 -0700 Message-Id: <20181012060014.10242-5-jhubbard@nvidia.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181012060014.10242-1-jhubbard@nvidia.com> References: <20181012060014.10242-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: John Hubbard Add two struct page fields that, combined, are unioned with struct page->lru. There is no change in the size of struct page. These new fields are for type safety and clarity. Also add page flag accessors to test, set and clear the new page->dma_pinned_flags field. The page->dma_pinned_count field will be used in upcoming patches Signed-off-by: John Hubbard --- include/linux/mm_types.h | 22 +++++++++++++----- include/linux/page-flags.h | 47 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 63 insertions(+), 6 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5ed8f6292a53..017ab82e36ca 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -78,12 +78,22 @@ struct page { */ union { struct { /* Page cache and anonymous pages */ - /** - * @lru: Pageout list, eg. active_list protected by - * zone_lru_lock. Sometimes used as a generic list - * by the page owner. - */ - struct list_head lru; + union { + /** + * @lru: Pageout list, eg. active_list protected + * by zone_lru_lock. Sometimes used as a + * generic list by the page owner. + */ + struct list_head lru; + /* Used by get_user_pages*(). Pages may not be + * on an LRU while these dma_pinned_* fields + * are in use. + */ + struct { + unsigned long dma_pinned_flags; + atomic_t dma_pinned_count; + }; + }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; pgoff_t index; /* Our offset within mapping. */ diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 74bee8cecf4c..81ed52c3caae 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -425,6 +425,53 @@ static __always_inline int __PageMovable(struct page *page) PAGE_MAPPING_MOVABLE; } +/* + * Because page->dma_pinned_flags is unioned with page->lru, any page that + * uses these flags must NOT be on an LRU. That's partly enforced by + * ClearPageDmaPinned, which gives the page back to LRU. + * + * PageDmaPinned also corresponds to PageTail (the 0th bit in the first union + * of struct page), and this flag is checked without knowing whether it is a + * tail page or a PageDmaPinned page. Therefore, start the flags at bit 1 (0x2), + * rather than bit 0. + */ +#define PAGE_DMA_PINNED 0x2 +#define PAGE_DMA_PINNED_FLAGS (PAGE_DMA_PINNED) + +/* + * Because these flags are read outside of a lock, ensure visibility between + * different threads, by using READ|WRITE_ONCE. + */ +static __always_inline int PageDmaPinnedFlags(struct page *page) +{ + VM_BUG_ON(page != compound_head(page)); + return (READ_ONCE(page->dma_pinned_flags) & PAGE_DMA_PINNED_FLAGS) != 0; +} + +static __always_inline int PageDmaPinned(struct page *page) +{ + VM_BUG_ON(page != compound_head(page)); + return (READ_ONCE(page->dma_pinned_flags) & PAGE_DMA_PINNED) != 0; +} + +static __always_inline void SetPageDmaPinned(struct page *page) +{ + VM_BUG_ON(page != compound_head(page)); + WRITE_ONCE(page->dma_pinned_flags, PAGE_DMA_PINNED); +} + +static __always_inline void ClearPageDmaPinned(struct page *page) +{ + VM_BUG_ON(page != compound_head(page)); + VM_BUG_ON_PAGE(!PageDmaPinnedFlags(page), page); + + /* This does a WRITE_ONCE to the lru.next, which is also the + * page->dma_pinned_flags field. So in addition to restoring page->lru, + * this provides visibility to other threads. + */ + INIT_LIST_HEAD(&page->lru); +} + #ifdef CONFIG_KSM /* * A KSM page is one of those write-protected "shared pages" or "merged pages" From patchwork Fri Oct 12 06:00:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: john.hubbard@gmail.com X-Patchwork-Id: 10637917 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D635715E2 for ; Fri, 12 Oct 2018 06:00:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C7F7A2BB24 for ; Fri, 12 Oct 2018 06:00:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BB6E92BFF5; Fri, 12 Oct 2018 06:00:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6DA712BB24 for ; Fri, 12 Oct 2018 06:00:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727761AbeJLNbg (ORCPT ); Fri, 12 Oct 2018 09:31:36 -0400 Received: from mail-pl1-f195.google.com ([209.85.214.195]:43206 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727723AbeJLNbY (ORCPT ); Fri, 12 Oct 2018 09:31:24 -0400 Received: by mail-pl1-f195.google.com with SMTP id 30-v6so5363570plb.10; Thu, 11 Oct 2018 23:00:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=r6gf/3+FxFoo0uFxF1HQUrQtpnIyPUJrRDNGoh0sv3k=; b=svScXGM87F7pkDY1zQnXkoh0/E1jU6dk0eyAzQUvDSh5Wugzl9qEuvG7+Xu4dUWb8s SR8mYLdKbQHzzLAk5dDx+sjo76Sddq81tsZSt/tD+C3LvQ5k9pQhTlRjHosPb+d3GSXm /cGIXS5lMJJw+Ap7p+ExA/KOOSa5eRnZ+9tIIUXRqApUVtWqowKabLgR1wH7OvyBpnDH IVfAm5wSDyiM5HD46w4QSJQ4oRl/9xFNCEoKmBmFC6f8UFrgAJvf6Jc+fIm3+nTsnSFB NwtJ71tK1cupi2Fq3gN5qKUJegKEAjP41drN38u8eWJQbpGVxp5gVE24v49WVEZNnrlK LbbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=r6gf/3+FxFoo0uFxF1HQUrQtpnIyPUJrRDNGoh0sv3k=; b=Z8yLJNtZYBXMUYWEu4JduPz+06oWWz5Tq38nYEmLfxXqHhMNKs1HTGroKxLWfZ0J5t dunXjP1R0smLrqb8+z5Xr4zBy7RjbWNGPUVk6SFUzK1l5oCLIFOudCCtL5oktZPbnJn/ Dqj/UGJItf+hdKQLFzV55bFncv9Vykhe3izSNg9kodtYd1NXtW2nuNHEiH4bhjz00dkH yrYZsGqad+naPnPYzJ1JEOJqd6LP0I4qnOFAGo6lopWmznagqGYzpJ6dnRowHDYWm6mh Nq+QiPNFngdG0bNtRuCPgUBRGRRLRhSfACj1jaxMFMo5zkOQVO5TtGLpibTzkVMCryOj sxXA== X-Gm-Message-State: ABuFfohKe3AWGk638NTv2JXrH+U8deYRlWdveoJaDNGdl/T2JiE6A0ki QLK72IIq+d2bqaeyKjQ1+iQ= X-Google-Smtp-Source: ACcGV63SVIxTN5usiPLLB6gc3qBl/+Vh3uNHa6giBPTmGkEMBVLUqMufOt9kVZRfmZGKcEL9+tf8rg== X-Received: by 2002:a17:902:124:: with SMTP id 33-v6mr4618125plb.205.1539324035839; Thu, 11 Oct 2018 23:00:35 -0700 (PDT) Received: from blueforge.nvidia.com (searspoint.nvidia.com. [216.228.112.21]) by smtp.gmail.com with ESMTPSA id z3-v6sm368579pfm.150.2018.10.11.23.00.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Oct 2018 23:00:34 -0700 (PDT) From: john.hubbard@gmail.com X-Google-Original-From: jhubbard@nvidia.com To: Matthew Wilcox , Michal Hocko , Christopher Lameter , Jason Gunthorpe , Dan Williams , Jan Kara Cc: linux-mm@kvack.org, Andrew Morton , LKML , linux-rdma , linux-fsdevel@vger.kernel.org, John Hubbard Subject: [PATCH 5/6] mm: introduce zone_gup_lock, for dma-pinned pages Date: Thu, 11 Oct 2018 23:00:13 -0700 Message-Id: <20181012060014.10242-6-jhubbard@nvidia.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181012060014.10242-1-jhubbard@nvidia.com> References: <20181012060014.10242-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: John Hubbard The page->dma_pinned_flags and _count fields require lock protection. A lock at approximately the granularity of the zone_lru_lock is called for, but adding to the locking contention of zone_lru_lock is undesirable, because that is a pre-existing hot spot. Fortunately, these new dma_pinned_* fields can use an independent lock, so this patch creates an entirely new lock, right next to the zone_lru_lock. Why "zone_gup_lock"? Most of the naming refers to "DMA-pinned pages", but "zone DMA lock" has other meanings already, so this is called zone_gup_lock instead. The "dma pinning" is a result of get_user_pages (gup) being called, so the name still helps explain its use. Signed-off-by: John Hubbard --- include/linux/mmzone.h | 6 ++++++ mm/page_alloc.c | 1 + 2 files changed, 7 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d4b0c79d2924..971a63f84ad5 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -661,6 +661,7 @@ typedef struct pglist_data { enum zone_type kswapd_classzone_idx; int kswapd_failures; /* Number of 'reclaimed == 0' runs */ + spinlock_t pinned_dma_lock; #ifdef CONFIG_COMPACTION int kcompactd_max_order; @@ -730,6 +731,11 @@ static inline spinlock_t *zone_lru_lock(struct zone *zone) return &zone->zone_pgdat->lru_lock; } +static inline spinlock_t *zone_gup_lock(struct zone *zone) +{ + return &zone->zone_pgdat->pinned_dma_lock; +} + static inline struct lruvec *node_lruvec(struct pglist_data *pgdat) { return &pgdat->lruvec; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e2ef1c17942f..850f90223cc7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6225,6 +6225,7 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat) pgdat_page_ext_init(pgdat); spin_lock_init(&pgdat->lru_lock); + spin_lock_init(&pgdat->pinned_dma_lock); lruvec_init(node_lruvec(pgdat)); } From patchwork Fri Oct 12 06:00:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: john.hubbard@gmail.com X-Patchwork-Id: 10637911 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7478717E1 for ; Fri, 12 Oct 2018 06:00:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 659BE2BB24 for ; Fri, 12 Oct 2018 06:00:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 599632BFF5; Fri, 12 Oct 2018 06:00:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9E4AE2BB24 for ; Fri, 12 Oct 2018 06:00:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727851AbeJLNb1 (ORCPT ); Fri, 12 Oct 2018 09:31:27 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:37783 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727797AbeJLNb0 (ORCPT ); Fri, 12 Oct 2018 09:31:26 -0400 Received: by mail-pl1-f194.google.com with SMTP id u6-v6so2639013plz.4; Thu, 11 Oct 2018 23:00:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Mio5E6AAn04E3CDvnsyyYWFtQZ3oBTMH05RInvMTmy0=; b=eotUQ9nzqS0sMUiu6wBL+tki9VFzeqzk95apOmNU3+kIsHo6a7siMkrCvROcBc0Y8b xhSW9g+b/WF7JXoJJaAHHfCdoXhx1EJ/shZiI1T1CWVIMMDUq2IFqNzOeRXviCMdN64x +lga058QRlO1LAYg6/s72G++FsUrhdWv0K/JE9WXcXLUAByC08aVY0AWbC9Xcvj9LLwV WHCs9Gma7AekiHafVDx1iCe98UpOwYB5dhGBZLXVAK78swGKXuncOTvscBPYZaIlCbK8 Og9VMrWGhgRU41vqj3npVCb/WVfVtI3+NO0j61uQ/yZHuxHhWIMdvG7toGfn4xcwme57 l/iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Mio5E6AAn04E3CDvnsyyYWFtQZ3oBTMH05RInvMTmy0=; b=ThByAWJ11PpaY9pOQ5mSLSmoAPKs2/Hr1f9uix4+xY/Ocdlu/o1DH+YJJqEj/xlDs7 6x88Kb9s5rZomMFLMGPatnyjuOrrhhFnIFA7/pIdYZZKGuLKOKJsFBRegfHq/OmAs9Mc xEWS+b6C53pUm9I2l3EtqzuRjV++EO4UM3Ndqp+YPmcSxzPaQfdStOHiqVp7rRJA5oQr 0TS421xFcjC/TjI7FoczPORHuuUiAervg3poLyHHJ0/9urxEenyPi6VvC8ETwL+vrhY9 IKE5ZjNgI6iIRaWQE+8ryBcI7L2QLrERcK3MyDUNwaa16MfQU2wLjeEVvUicr5Bf+kzM RrCQ== X-Gm-Message-State: ABuFfojAhlr57mPLkFi0UGASQE2Kl9UVe08GCQaqeigZoZrTvn6jVK2w hcVmlFyLYY4fs0364VSk+NI= X-Google-Smtp-Source: ACcGV60B5C1zuvcSQDMS8Kd0c6ruRKBh+9H1nGnxn4M2HJSPu8qSOBanps0eNvNz8WMxtcjIG8uczw== X-Received: by 2002:a17:902:8690:: with SMTP id g16-v6mr4377140plo.302.1539324037336; Thu, 11 Oct 2018 23:00:37 -0700 (PDT) Received: from blueforge.nvidia.com (searspoint.nvidia.com. [216.228.112.21]) by smtp.gmail.com with ESMTPSA id z3-v6sm368579pfm.150.2018.10.11.23.00.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Oct 2018 23:00:36 -0700 (PDT) From: john.hubbard@gmail.com X-Google-Original-From: jhubbard@nvidia.com To: Matthew Wilcox , Michal Hocko , Christopher Lameter , Jason Gunthorpe , Dan Williams , Jan Kara Cc: linux-mm@kvack.org, Andrew Morton , LKML , linux-rdma , linux-fsdevel@vger.kernel.org, John Hubbard Subject: [PATCH 6/6] mm: track gup pages with page->dma_pinned_* fields Date: Thu, 11 Oct 2018 23:00:14 -0700 Message-Id: <20181012060014.10242-7-jhubbard@nvidia.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181012060014.10242-1-jhubbard@nvidia.com> References: <20181012060014.10242-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: John Hubbard This patch sets and restores the new page->dma_pinned_flags and page->dma_pinned_count fields, but does not actually use them for anything yet. In order to use these fields at all, the page must be removed from any LRU list that it's on. The patch also adds some precautions that prevent the page from getting moved back onto an LRU, once it is in this state. This is in preparation to fix some problems that came up when using devices (NICs, GPUs, for example) that set up direct access to a chunk of system (CPU) memory, so that they can DMA to/from that memory. CC: Matthew Wilcox CC: Jan Kara CC: Dan Williams Signed-off-by: John Hubbard --- include/linux/mm.h | 15 ++----------- mm/gup.c | 56 ++++++++++++++++++++++++++++++++++++++++++++-- mm/memcontrol.c | 7 ++++++ mm/swap.c | 51 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 114 insertions(+), 15 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 76d18aada9f8..44878d21e27b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -944,21 +944,10 @@ static inline void put_page(struct page *page) } /* - * put_user_page() - release a page that had previously been acquired via - * a call to one of the get_user_pages*() functions. - * * Pages that were pinned via get_user_pages*() must be released via - * either put_user_page(), or one of the put_user_pages*() routines - * below. This is so that eventually, pages that are pinned via - * get_user_pages*() can be separately tracked and uniquely handled. In - * particular, interactions with RDMA and filesystems need special - * handling. + * one of these put_user_pages*() routines: */ -static inline void put_user_page(struct page *page) -{ - put_page(page); -} - +void put_user_page(struct page *page); void put_user_pages_dirty(struct page **pages, unsigned long npages); void put_user_pages_dirty_lock(struct page **pages, unsigned long npages); void put_user_pages(struct page **pages, unsigned long npages); diff --git a/mm/gup.c b/mm/gup.c index 05ee7c18e59a..fddbc30cde89 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -20,6 +20,51 @@ #include "internal.h" +static int pin_page_for_dma(struct page *page) +{ + int ret = 0; + struct zone *zone; + + page = compound_head(page); + zone = page_zone(page); + + spin_lock(zone_gup_lock(zone)); + + if (PageDmaPinned(page)) { + /* Page was not on an LRU list, because it was DMA-pinned. */ + VM_BUG_ON_PAGE(PageLRU(page), page); + + atomic_inc(&page->dma_pinned_count); + goto unlock_out; + } + + /* + * Note that page->dma_pinned_flags is unioned with page->lru. + * Therefore, the rules are: checking if any of the + * PAGE_DMA_PINNED_FLAGS bits are set may be done while page->lru + * is in use. However, setting those flags requires that + * the page is both locked, and also, removed from the LRU. + */ + ret = isolate_lru_page(page); + + if (ret == 0) { + /* Avoid problems later, when freeing the page: */ + ClearPageActive(page); + ClearPageUnevictable(page); + + /* counteract isolate_lru_page's effects: */ + put_page(page); + + atomic_set(&page->dma_pinned_count, 1); + SetPageDmaPinned(page); + } + +unlock_out: + spin_unlock(zone_gup_lock(zone)); + + return ret; +} + static struct page *no_page_table(struct vm_area_struct *vma, unsigned int flags) { @@ -659,7 +704,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas, int *nonblocking) { - long i = 0; + long i = 0, j; int err = 0; unsigned int page_mask; struct vm_area_struct *vma = NULL; @@ -764,6 +809,10 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, } while (nr_pages); out: + if (pages) + for (j = 0; j < i; j++) + pin_page_for_dma(pages[j]); + return i ? i : err; } @@ -1841,7 +1890,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages) { unsigned long addr, len, end; - int nr = 0, ret = 0; + int nr = 0, ret = 0, i; start &= PAGE_MASK; addr = start; @@ -1862,6 +1911,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write, ret = nr; } + for (i = 0; i < nr; i++) + pin_page_for_dma(pages[i]); + if (nr < nr_pages) { /* Try to get the remaining pages with get_user_pages */ start += nr << PAGE_SHIFT; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e79cb59552d9..af9719756081 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2335,6 +2335,11 @@ static void lock_page_lru(struct page *page, int *isolated) if (PageLRU(page)) { struct lruvec *lruvec; + /* LRU and PageDmaPinned are mutually exclusive: they use the + * same fields in struct page, but for different purposes. + */ + VM_BUG_ON_PAGE(PageDmaPinned(page), page); + lruvec = mem_cgroup_page_lruvec(page, zone->zone_pgdat); ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_lru(page)); @@ -2352,6 +2357,8 @@ static void unlock_page_lru(struct page *page, int isolated) lruvec = mem_cgroup_page_lruvec(page, zone->zone_pgdat); VM_BUG_ON_PAGE(PageLRU(page), page); + VM_BUG_ON_PAGE(PageDmaPinned(page), page); + SetPageLRU(page); add_page_to_lru_list(page, lruvec, page_lru(page)); } diff --git a/mm/swap.c b/mm/swap.c index efab3a6b6f91..6b2b1a958a67 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -134,6 +134,46 @@ void put_pages_list(struct list_head *pages) } EXPORT_SYMBOL(put_pages_list); +/* + * put_user_page() - release a page that had previously been acquired via + * a call to one of the get_user_pages*() functions. + * + * Pages that were pinned via get_user_pages*() must be released via + * either put_user_page(), or one of the put_user_pages*() routines + * below. This is so that eventually, pages that are pinned via + * get_user_pages*() can be separately tracked and uniquely handled. In + * particular, interactions with RDMA and filesystems need special + * handling. + */ +void put_user_page(struct page *page) +{ + struct zone *zone = page_zone(page); + + page = compound_head(page); + + VM_BUG_ON_PAGE(PageLRU(page), page); + VM_BUG_ON_PAGE(!PageDmaPinned(page), page); + + if (atomic_dec_and_test(&page->dma_pinned_count)) { + spin_lock(zone_gup_lock(zone)); + + /* Re-check while holding the lock, because + * pin_page_for_dma() or get_page() may have snuck in right + * after the atomic_dec_and_test, and raised the count + * above zero again. If so, just leave the flag set. And + * because the atomic_dec_and_test above already got the + * accounting correct, no other action is required. + */ + if (atomic_read(&page->dma_pinned_count) == 0) + ClearPageDmaPinned(page); + + spin_unlock(zone_gup_lock(zone)); + } + + put_page(page); +} +EXPORT_SYMBOL(put_user_page); + /* * put_user_pages_dirty() - for each page in the @pages array, make * that page (or its head page, if a compound page) dirty, if it was @@ -907,6 +947,11 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, VM_BUG_ON_PAGE(!PageHead(page), page); VM_BUG_ON_PAGE(PageCompound(page_tail), page); VM_BUG_ON_PAGE(PageLRU(page_tail), page); + + /* LRU and PageDmaPinned are mutually exclusive: they use the + * same fields in struct page, but for different purposes. + */ + VM_BUG_ON_PAGE(PageDmaPinned(page_tail), page); VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&lruvec_pgdat(lruvec)->lru_lock)); @@ -946,6 +991,12 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, VM_BUG_ON_PAGE(PageLRU(page), page); + /* LRU and PageDmaPinned are mutually exclusive: they use the + * same fields in struct page, but for different purposes. + */ + if (PageDmaPinned(page)) + return; + SetPageLRU(page); /* * Page becomes evictable in two ways: