From patchwork Fri Feb 8 07:56:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: john.hubbard@gmail.com X-Patchwork-Id: 10802461 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 668121669 for ; Fri, 8 Feb 2019 07:57:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5BC802D903 for ; Fri, 8 Feb 2019 07:57:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4EE0A2D91F; Fri, 8 Feb 2019 07:57:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 883752D915 for ; Fri, 8 Feb 2019 07:56:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44E538E0081; Fri, 8 Feb 2019 02:56:57 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3D9D18E0002; Fri, 8 Feb 2019 02:56:57 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DBB18E0081; Fri, 8 Feb 2019 02:56:57 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id C51B58E0002 for ; Fri, 8 Feb 2019 02:56:56 -0500 (EST) Received: by mail-pl1-f200.google.com with SMTP id 71so1871109plf.19 for ; Thu, 07 Feb 2019 23:56:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=33aqc7VxCmQrf2zs2FpPkA7KtnfR/GRdMJWgpUgJpEM=; b=s0M4YvwbgIYheFH4WHwD1iEWR0p+kXLcEumtR6EW8t2x/NJopxGIyruxx2iezA+cTJ EUreDLrPS9Gt7sfKHf+fDRX/w9Kd1aTgAmYBFaYvRegPJPL7ZpLZUYMWVa+7y2kt3F2Z 7bqqBvEySCIIFf7/cRcMYPSPyPhafoQirAkcKd+IKeAq6Qjnte9WvdcISc8JM9eXOhIr 67igG7ApgvW9UK1qUzGD6ixMVTkSctvhhNLNBkPhO/JDOAwK1jg4OUO6I/RhrUhIG93e s8/a01eRvECpZutIQ0/n92nWR1f9NwbODgDUKO1uZbKWX9u32n3+41UYg1lQmmHBQa/v mBdg== X-Gm-Message-State: AHQUAuY6ItpN73hJyXSCKyKN36Y/DopK0tV8WTzvmhwaXGSWsln9/Khw 7B2bmLCVQICKtxhsQp9CisANVG8lsocAP8sbxZ1WP/OycvvfrcCup10izHZ5ozQnSRUEV8qncy6 wdwhiu2fXFrR2j8xQhczZks/0I/VKUO3awtk98bp/2xzVG9RygRO/qh57xEm9F6rvZTxKoKQqni W0xQpvyxVLDkeI2myCT7ZtHKIYFYsM6DH1nMZ1LsGFIpNcLg+tLBcRRfjdH/1HsYXR7AiIi2thi K0uPBflViSQbAKKqGLEP+3HeM+0x1O9WfvEU8wwvAYRHR7U2gq5McHPVcrKVqIPx6RoPGA6Hl+0 0kXASS5f/ai3oEgQUjGCklHd1T8pL0jhc5PExM77x3JhcrQhsG4jz6TvW988B7qNzjRzbBXr59l h X-Received: by 2002:a65:64c8:: with SMTP id t8mr8760602pgv.31.1549612616253; Thu, 07 Feb 2019 23:56:56 -0800 (PST) X-Received: by 2002:a65:64c8:: with SMTP id t8mr8760542pgv.31.1549612615078; Thu, 07 Feb 2019 23:56:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549612615; cv=none; d=google.com; s=arc-20160816; b=Ib9F0aXNZVQ4K3na5Qi+UKQ8/qywxrPOZPd+OKjklJj+15TuL+77rBIzAOzeND5F+O jG4r0z8BJ4b/IEfq3WkYvx3h99jAmrG7JhUGWveVkviOudmwvG84OnMA/FNg0581lpYP ok4scJoggON5oDxrlnmS03P2/zRljXPxcHKbezx9o2A0oz68fHyQNodFXAySMzhOPm7U 1zB/KJgPkIp9LZlloeDm/l1lC+nXbYPAU6QFTJBHau7abNkZi/0HFtTWIde2A74uyoUz OwIbnDK3gYkRt4ByjoJBgBqy1DohEGL+F5lJbcnp/EGeyNKFG2ufmTTSTnnmmiUz04+8 NqaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=33aqc7VxCmQrf2zs2FpPkA7KtnfR/GRdMJWgpUgJpEM=; b=eHwjXFNuqqHl6Ml45r0q4Lrsyps0wIVSwLySPmigTDGT1i7MzzRtsVogbkivVBXRfW XlDMMW7Te0wse0r7UNg002+2m+LTQ/PjcSf4oM6CysAsno1UVREQGuWQrqd1CrtTPt64 HIzYUcL7pXUR3Uzin9ZEUN0gybFWg5RGjdDJHsGkuFWWbXG1w+zon7nL+fcx8edswhYy BOIWzxqKeS0kogbzyMxcIaDhULRACSzNSor1dyuW5jOUDY0e0D84GNddFSS0GlPncBbY 2XblFWshxXU3++mRA7l7GJ85G8IxSNQYt/TCGItRoIOuEPg2VO6J5x7EGqDTJkW1cqW9 N1UQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=oOeYqfUt; spf=pass (google.com: domain of john.hubbard@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=john.hubbard@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id a16sor1617505pgw.48.2019.02.07.23.56.54 for (Google Transport Security); Thu, 07 Feb 2019 23:56:55 -0800 (PST) Received-SPF: pass (google.com: domain of john.hubbard@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=oOeYqfUt; spf=pass (google.com: domain of john.hubbard@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=john.hubbard@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=33aqc7VxCmQrf2zs2FpPkA7KtnfR/GRdMJWgpUgJpEM=; b=oOeYqfUtXBNrVgo+VeodEVVR034PVDJLhgzdbme81C0PIxX2ttKG4gV4sD+b5UGM6n xpO2VvX1Wq6ajdWaoT2zI7Sw2gCqnmz1FeklNnKA1Lnqko7hzAwP99nbtTJax/xZpbMx K6kNidvmaJuQ1QVoKxxOPw2EBhE8Pz/cuYxoN5xYyebzyDJZRAx6eZ2uoZN8MasOZWrh 0b734phPIp3ld30v/CXZnD8ZXHmtmgTRmTq+aRRt+cV1BBn34gN+7jX+pqEGPQzuVhUr uazvaGrsI5gxRaaW8hnzecpNaEWJ/JctpA9szJ+xEeyX9mrdfJyddmqisjabYbTjlSKn m3Hw== X-Google-Smtp-Source: AHgI3IYo9MNT56RXdhRr+J3mZ3gjZXndch8d3O7H0eXplSlfFA8QPMUsfVhX1gtcwwJ3Gse7G9c1+g== X-Received: by 2002:a63:c56:: with SMTP id 22mr4014447pgm.44.1549612614688; Thu, 07 Feb 2019 23:56:54 -0800 (PST) Received: from blueforge.nvidia.com (searspoint.nvidia.com. [216.228.112.21]) by smtp.gmail.com with ESMTPSA id h64sm2642610pfc.142.2019.02.07.23.56.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Feb 2019 23:56:53 -0800 (PST) From: john.hubbard@gmail.com X-Google-Original-From: jhubbard@nvidia.com To: Andrew Morton , linux-mm@kvack.org Cc: Al Viro , Christian Benvenuti , Christoph Hellwig , Christopher Lameter , Dan Williams , Dave Chinner , Dennis Dalessandro , Doug Ledford , Jan Kara , Jason Gunthorpe , Jerome Glisse , Matthew Wilcox , Michal Hocko , Mike Rapoport , Mike Marciniszyn , Ralph Campbell , Tom Talpey , LKML , linux-fsdevel@vger.kernel.org, John Hubbard Subject: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions Date: Thu, 7 Feb 2019 23:56:48 -0800 Message-Id: <20190208075649.3025-2-jhubbard@nvidia.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190208075649.3025-1-jhubbard@nvidia.com> References: <20190208075649.3025-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: John Hubbard Introduces put_user_page(), which simply calls put_page(). This provides a way to update all get_user_pages*() callers, so that they call put_user_page(), instead of put_page(). Also introduces put_user_pages(), and a few dirty/locked variations, as a replacement for release_pages(), and also as a replacement for open-coded loops that release multiple pages. These may be used for subsequent performance improvements, via batching of pages to be released. This is the first step of fixing a problem (also described in [1] and [2]) with interactions between get_user_pages ("gup") and filesystems. Problem description: let's start with a bug report. Below, is what happens sometimes, under memory pressure, when a driver pins some pages via gup, and then marks those pages dirty, and releases them. Note that the gup documentation actually recommends that pattern. The problem is that the filesystem may do a writeback while the pages were gup-pinned, and then the filesystem believes that the pages are clean. So, when the driver later marks the pages as dirty, that conflicts with the filesystem's page tracking and results in a BUG(), like this one that I experienced: kernel BUG at /build/linux-fQ94TU/linux-4.4.0/fs/ext4/inode.c:1899! backtrace: ext4_writepage __writepage write_cache_pages ext4_writepages do_writepages __writeback_single_inode writeback_sb_inodes __writeback_inodes_wb wb_writeback wb_workfn process_one_work worker_thread kthread ret_from_fork ...which is due to the file system asserting that there are still buffer heads attached: ({ \ BUG_ON(!PagePrivate(page)); \ ((struct buffer_head *)page_private(page)); \ }) Dave Chinner's description of this is very clear: "The fundamental issue is that ->page_mkwrite must be called on every write access to a clean file backed page, not just the first one. How long the GUP reference lasts is irrelevant, if the page is clean and you need to dirty it, you must call ->page_mkwrite before it is marked writeable and dirtied. Every. Time." This is just one symptom of the larger design problem: filesystems do not actually support get_user_pages() being called on their pages, and letting hardware write directly to those pages--even though that patter has been going on since about 2005 or so. The steps are to fix it are: 1) (This patch): provide put_user_page*() routines, intended to be used for releasing pages that were pinned via get_user_pages*(). 2) Convert all of the call sites for get_user_pages*(), to invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, and will take some time. 3) After (2) is complete, use get_user_pages*() and put_user_page*() to implement tracking of these pages. This tracking will be separate from the existing struct page refcounting. 4) Use the tracking and identification of these pages, to implement special handling (especially in writeback paths) when the pages are backed by a filesystem. [1] https://lwn.net/Articles/774411/ : "DMA and get_user_pages()" [2] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" Cc: Al Viro Cc: Christoph Hellwig Cc: Christopher Lameter Cc: Dan Williams Cc: Dave Chinner Cc: Jan Kara Cc: Jason Gunthorpe Cc: Jerome Glisse Cc: Matthew Wilcox Cc: Michal Hocko Cc: Mike Rapoport Cc: Ralph Campbell Reviewed-by: Jan Kara Signed-off-by: John Hubbard Reviewed-by: Mike Rapoport # docs --- include/linux/mm.h | 24 ++++++++++++++ mm/swap.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 106 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 80bb6408fe73..809b7397d41e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -993,6 +993,30 @@ static inline void put_page(struct page *page) __put_page(page); } +/** + * put_user_page() - release a gup-pinned page + * @page: pointer to page to be released + * + * Pages that were pinned via get_user_pages*() must be released via + * either put_user_page(), or one of the put_user_pages*() routines + * below. This is so that eventually, pages that are pinned via + * get_user_pages*() can be separately tracked and uniquely handled. In + * particular, interactions with RDMA and filesystems need special + * handling. + * + * put_user_page() and put_page() are not interchangeable, despite this early + * implementation that makes them look the same. put_user_page() calls must + * be perfectly matched up with get_user_page() calls. + */ +static inline void put_user_page(struct page *page) +{ + put_page(page); +} + +void put_user_pages_dirty(struct page **pages, unsigned long npages); +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages); +void put_user_pages(struct page **pages, unsigned long npages); + #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define SECTION_IN_PAGE_FLAGS #endif diff --git a/mm/swap.c b/mm/swap.c index 4929bc1be60e..7c42ca45bb89 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -133,6 +133,88 @@ void put_pages_list(struct list_head *pages) } EXPORT_SYMBOL(put_pages_list); +typedef int (*set_dirty_func)(struct page *page); + +static void __put_user_pages_dirty(struct page **pages, + unsigned long npages, + set_dirty_func sdf) +{ + unsigned long index; + + for (index = 0; index < npages; index++) { + struct page *page = compound_head(pages[index]); + + if (!PageDirty(page)) + sdf(page); + + put_user_page(page); + } +} + +/** + * put_user_pages_dirty() - release and dirty an array of gup-pinned pages + * @pages: array of pages to be marked dirty and released. + * @npages: number of pages in the @pages array. + * + * "gup-pinned page" refers to a page that has had one of the get_user_pages() + * variants called on that page. + * + * For each page in the @pages array, make that page (or its head page, if a + * compound page) dirty, if it was previously listed as clean. Then, release + * the page using put_user_page(). + * + * Please see the put_user_page() documentation for details. + * + * set_page_dirty(), which does not lock the page, is used here. + * Therefore, it is the caller's responsibility to ensure that this is + * safe. If not, then put_user_pages_dirty_lock() should be called instead. + * + */ +void put_user_pages_dirty(struct page **pages, unsigned long npages) +{ + __put_user_pages_dirty(pages, npages, set_page_dirty); +} +EXPORT_SYMBOL(put_user_pages_dirty); + +/** + * put_user_pages_dirty_lock() - release and dirty an array of gup-pinned pages + * @pages: array of pages to be marked dirty and released. + * @npages: number of pages in the @pages array. + * + * For each page in the @pages array, make that page (or its head page, if a + * compound page) dirty, if it was previously listed as clean. Then, release + * the page using put_user_page(). + * + * Please see the put_user_page() documentation for details. + * + * This is just like put_user_pages_dirty(), except that it invokes + * set_page_dirty_lock(), instead of set_page_dirty(). + * + */ +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages) +{ + __put_user_pages_dirty(pages, npages, set_page_dirty_lock); +} +EXPORT_SYMBOL(put_user_pages_dirty_lock); + +/** + * put_user_pages() - release an array of gup-pinned pages. + * @pages: array of pages to be marked dirty and released. + * @npages: number of pages in the @pages array. + * + * For each page in the @pages array, release the page using put_user_page(). + * + * Please see the put_user_page() documentation for details. + */ +void put_user_pages(struct page **pages, unsigned long npages) +{ + unsigned long index; + + for (index = 0; index < npages; index++) + put_user_page(pages[index]); +} +EXPORT_SYMBOL(put_user_pages); + /* * get_kernel_pages() - pin kernel pages in memory * @kiov: An array of struct kvec structures From patchwork Fri Feb 8 07:56:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: john.hubbard@gmail.com X-Patchwork-Id: 10802463 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1CEDF6C2 for ; Fri, 8 Feb 2019 07:57:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B4C62D903 for ; Fri, 8 Feb 2019 07:57:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F11972D916; Fri, 8 Feb 2019 07:57:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B4D72D903 for ; Fri, 8 Feb 2019 07:57:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 89CC28E0082; Fri, 8 Feb 2019 02:56:58 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 787098E0002; Fri, 8 Feb 2019 02:56:58 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5AEC78E0082; Fri, 8 Feb 2019 02:56:58 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 130C58E0002 for ; Fri, 8 Feb 2019 02:56:58 -0500 (EST) Received: by mail-pl1-f200.google.com with SMTP id a9so1910447pla.2 for ; Thu, 07 Feb 2019 23:56:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=J2d/n7dMaeGjTEKIDIBTiayW7FnCIDacKxsczsn6v/k=; b=q9tNhtb7zVYo0lqJg5JphQwuBHmghLINQTBy7GPDeIzwXTNWCwOhJ3moDJ1dPie8e0 WH/8v3vq3RtzpEC0Y4BSEZ8N9hJODKA5xAj/aNpB/wAnPQFFRLz3AIOdrDXBVXDEXiBB 6HiUGGx+UxkZC8S+2TPQDHjddDMKVOmiFOIPtg8iW+CsIqVL3A2OqqjfgTebsMDe6+3E OE6ugs6qO+gT4KU7NP1wYKvVrp/YD75kc7n4JLd5v6YeIHZUCNMzv1DlPS+cmOeOYPbH D9iZ0uqf91UWunwJOyi6S16HNsIz9Vmi5f7pJnvsGrnyOL4mLA0D6+Gg8rwgJvjjQBa7 rzJQ== X-Gm-Message-State: AHQUAuaUaR3tcA1/ASGsb2oxJqdww/Ds7ItVMJULkvOm+/ZihVT7JOr4 tqVTrH6glsu/PlNm2SF9jq+AmJX5CL88TrrNaR1/zlFREH968PcrjIYCggX0GOo4NApBgrI9oxo +iuo0nwtVahU2L6RMWFRU3uNQpleRJwI5L8FHT13KJzHmJda6QahqS+ASP0kNhMy7xNEweRaMcV 6CKNNyVgCY+UIALbopEXQq284lAuhpbXuIxyRlgjkGl7xwIwgL8Ed1XDDgY2DpFd1mx0LHaTpZx ANYady48/1YzsTgcsCTnFQVhPJZHV4DIMDH20hc+jGraKs1iMXLLRay1atZ42/Eet7Pj4teT3lq NnJ/YVougYxciwcbM8cnS3wyaCUGx15sEu9BmfHkqJSdnQK7GrrdhL/F0w89HiE0yzwuJlwBkcK W X-Received: by 2002:aa7:800c:: with SMTP id j12mr2935431pfi.183.1549612617743; Thu, 07 Feb 2019 23:56:57 -0800 (PST) X-Received: by 2002:aa7:800c:: with SMTP id j12mr2935360pfi.183.1549612616726; Thu, 07 Feb 2019 23:56:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549612616; cv=none; d=google.com; s=arc-20160816; b=bNMjrefJQQpjuohlvQRKdj8r3XNmVs4t8kmZllB9bVsrL9GWEn0HIioZUga6VGT1/v 5xs5WraEaKW/5svPSBR+Uyum695hTHIArgAE2KVd6WfAVwR+/Qt4qPcqt7PzQ5AzEP9d /T7Xzn8kcXlWCrDvT4dQym0C0FhdhxrUf7nrqdRXwT0togNCZ+TWrEArSIFzvPkZQGum bpOT46LP5RCXYIn2+PN2ZKZV2xW59yMrTTRJgDZOZzTTmIE9j5Xgxouv567oOkzLs/of J9m4dNFwE+TyHiSA7pTbZReN5BXJWud1k38UbD1z9d2wdavv7Tt8mmpQrohXK280Igfh xAsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=J2d/n7dMaeGjTEKIDIBTiayW7FnCIDacKxsczsn6v/k=; b=pAm7h5eTVkHTzcSWY6i66ba85zGJUmpTpvI16+es67SaQn78kOmLhy5BZJNvK5JEPL JR+k7pdc6RZhDTw5CWRq0itXbvEQTFw2oi4om+18QT7SUnKiBuq0XLVGqTwrwFHMK8GH 18wizLB956nNXzAh2L+1WjuHh975PxJnvea/cPZB7iDXV+baQ86qhztxDLzrgvm1cyZh LUhcSSINBnXY1KzBUCt2OSuYKzlbQu57VPK8kEnjpdhItRtR0uWs4tvawGzeVFR80gzm I92evVINs6DfsnyHxXAz91ZBiL4EJSxyHvy967noVXxWRnv8PTFx8fTTsI6D/zxm6VcY vw2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=I4GdsIz4; spf=pass (google.com: domain of john.hubbard@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=john.hubbard@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id d71sor1607027pga.73.2019.02.07.23.56.56 for (Google Transport Security); Thu, 07 Feb 2019 23:56:56 -0800 (PST) Received-SPF: pass (google.com: domain of john.hubbard@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=I4GdsIz4; spf=pass (google.com: domain of john.hubbard@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=john.hubbard@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=J2d/n7dMaeGjTEKIDIBTiayW7FnCIDacKxsczsn6v/k=; b=I4GdsIz44Gw1ZvdxS8A3KX2aMztLlez48JnFUkyF/acaAc80cF5HhhgrbpoGR2PB+E vP8U4msgbCn1JRQtjFisJFWRt+xNAUMU/ob+jHwE+eyLV8RMqHi1bS2fj1pBgNYUiWra clSIiacqFsl9crG1jF98vOlxCxG1p6wFAjL7/g1xMk/yIy6s+gR6l5+9vCDOOdx4Xz+I AYruW4TVowbQxZCmYY47S19UsD4WvHPvMClC6ZQoh7GC/KFrFfn/1uSc7RX5TAOd5Gg2 Y3KKyAj5lkTr2tHH/EJJ6Uf280tplM9hzxdBlAg5IgcgttR2fxsX+wVOMcstZ4YtxbXf qZ3g== X-Google-Smtp-Source: AHgI3IYYHIztn9vPyDo0otP9186JN8e6QmitPtzNtoARPeba6iDr110h0m/ZLkxOo1nRMHdRATOVJQ== X-Received: by 2002:a63:4611:: with SMTP id t17mr9848800pga.119.1549612616425; Thu, 07 Feb 2019 23:56:56 -0800 (PST) Received: from blueforge.nvidia.com (searspoint.nvidia.com. [216.228.112.21]) by smtp.gmail.com with ESMTPSA id h64sm2642610pfc.142.2019.02.07.23.56.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Feb 2019 23:56:55 -0800 (PST) From: john.hubbard@gmail.com X-Google-Original-From: jhubbard@nvidia.com To: Andrew Morton , linux-mm@kvack.org Cc: Al Viro , Christian Benvenuti , Christoph Hellwig , Christopher Lameter , Dan Williams , Dave Chinner , Dennis Dalessandro , Doug Ledford , Jan Kara , Jason Gunthorpe , Jerome Glisse , Matthew Wilcox , Michal Hocko , Mike Rapoport , Mike Marciniszyn , Ralph Campbell , Tom Talpey , LKML , linux-fsdevel@vger.kernel.org, John Hubbard , Jason Gunthorpe Subject: [PATCH 2/2] infiniband/mm: convert put_page() to put_user_page*() Date: Thu, 7 Feb 2019 23:56:49 -0800 Message-Id: <20190208075649.3025-3-jhubbard@nvidia.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190208075649.3025-1-jhubbard@nvidia.com> References: <20190208075649.3025-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: John Hubbard For infiniband code that retains pages via get_user_pages*(), release those pages via the new put_user_page(), or put_user_pages*(), instead of put_page() This is a tiny part of the second step of fixing the problem described in [1]. The steps are: 1) Provide put_user_page*() routines, intended to be used for releasing pages that were pinned via get_user_pages*(). 2) Convert all of the call sites for get_user_pages*(), to invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, and will take some time. 3) After (2) is complete, use get_user_pages*() and put_user_page*() to implement tracking of these pages. This tracking will be separate from the existing struct page refcounting. 4) Use the tracking and identification of these pages, to implement special handling (especially in writeback paths) when the pages are backed by a filesystem. Again, [1] provides details as to why that is desirable. [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" Cc: Doug Ledford Cc: Jason Gunthorpe Cc: Mike Marciniszyn Cc: Dennis Dalessandro Cc: Christian Benvenuti Reviewed-by: Jan Kara Reviewed-by: Dennis Dalessandro Acked-by: Jason Gunthorpe Signed-off-by: John Hubbard --- drivers/infiniband/core/umem.c | 7 ++++--- drivers/infiniband/core/umem_odp.c | 2 +- drivers/infiniband/hw/hfi1/user_pages.c | 11 ++++------- drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +++--- drivers/infiniband/hw/qib/qib_user_pages.c | 11 ++++------- drivers/infiniband/hw/qib/qib_user_sdma.c | 6 +++--- drivers/infiniband/hw/usnic/usnic_uiom.c | 7 ++++--- 7 files changed, 23 insertions(+), 27 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index c6144df47ea4..c2898bc7b3b2 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -58,9 +58,10 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) { page = sg_page(sg); - if (!PageDirty(page) && umem->writable && dirty) - set_page_dirty_lock(page); - put_page(page); + if (umem->writable && dirty) + put_user_pages_dirty_lock(&page, 1); + else + put_user_page(page); } sg_free_table(&umem->sg_head); diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index acb882f279cb..d32757c1f77e 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -663,7 +663,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, ret = -EFAULT; break; } - put_page(local_page_list[j]); + put_user_page(local_page_list[j]); continue; } diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c index e341e6dcc388..99ccc0483711 100644 --- a/drivers/infiniband/hw/hfi1/user_pages.c +++ b/drivers/infiniband/hw/hfi1/user_pages.c @@ -121,13 +121,10 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np void hfi1_release_user_pages(struct mm_struct *mm, struct page **p, size_t npages, bool dirty) { - size_t i; - - for (i = 0; i < npages; i++) { - if (dirty) - set_page_dirty_lock(p[i]); - put_page(p[i]); - } + if (dirty) + put_user_pages_dirty_lock(p, npages); + else + put_user_pages(p, npages); if (mm) { /* during close after signal, mm can be NULL */ down_write(&mm->mmap_sem); diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c index 112d2f38e0de..99108f3dcf01 100644 --- a/drivers/infiniband/hw/mthca/mthca_memfree.c +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c @@ -481,7 +481,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar, ret = pci_map_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE); if (ret < 0) { - put_page(pages[0]); + put_user_page(pages[0]); goto out; } @@ -489,7 +489,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar, mthca_uarc_virt(dev, uar, i)); if (ret) { pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE); - put_page(sg_page(&db_tab->page[i].mem)); + put_user_page(sg_page(&db_tab->page[i].mem)); goto out; } @@ -555,7 +555,7 @@ void mthca_cleanup_user_db_tab(struct mthca_dev *dev, struct mthca_uar *uar, if (db_tab->page[i].uvirt) { mthca_UNMAP_ICM(dev, mthca_uarc_virt(dev, uar, i), 1); pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE); - put_page(sg_page(&db_tab->page[i].mem)); + put_user_page(sg_page(&db_tab->page[i].mem)); } } diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c index 16543d5e80c3..1a5c64c8695f 100644 --- a/drivers/infiniband/hw/qib/qib_user_pages.c +++ b/drivers/infiniband/hw/qib/qib_user_pages.c @@ -40,13 +40,10 @@ static void __qib_release_user_pages(struct page **p, size_t num_pages, int dirty) { - size_t i; - - for (i = 0; i < num_pages; i++) { - if (dirty) - set_page_dirty_lock(p[i]); - put_page(p[i]); - } + if (dirty) + put_user_pages_dirty_lock(p, num_pages); + else + put_user_pages(p, num_pages); } /* diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c b/drivers/infiniband/hw/qib/qib_user_sdma.c index 31c523b2a9f5..a1a1ec4adffc 100644 --- a/drivers/infiniband/hw/qib/qib_user_sdma.c +++ b/drivers/infiniband/hw/qib/qib_user_sdma.c @@ -320,7 +320,7 @@ static int qib_user_sdma_page_to_frags(const struct qib_devdata *dd, * the caller can ignore this page. */ if (put) { - put_page(page); + put_user_page(page); } else { /* coalesce case */ kunmap(page); @@ -634,7 +634,7 @@ static void qib_user_sdma_free_pkt_frag(struct device *dev, kunmap(pkt->addr[i].page); if (pkt->addr[i].put_page) - put_page(pkt->addr[i].page); + put_user_page(pkt->addr[i].page); else __free_page(pkt->addr[i].page); } else if (pkt->addr[i].kvaddr) { @@ -709,7 +709,7 @@ static int qib_user_sdma_pin_pages(const struct qib_devdata *dd, /* if error, return all pages not managed by pkt */ free_pages: while (i < j) - put_page(pages[i++]); + put_user_page(pages[i++]); done: return ret; diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index 49275a548751..2ef8d31dc838 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -77,9 +77,10 @@ static void usnic_uiom_put_pages(struct list_head *chunk_list, int dirty) for_each_sg(chunk->page_list, sg, chunk->nents, i) { page = sg_page(sg); pa = sg_phys(sg); - if (!PageDirty(page) && dirty) - set_page_dirty_lock(page); - put_page(page); + if (dirty) + put_user_pages_dirty_lock(&page, 1); + else + put_user_page(page); usnic_dbg("pa: %pa\n", &pa); } kfree(chunk);