[4/6] mm/gup: track gup-pinned pages

From: John Hubbard <jhubbard@nvidia.com>

From: John Hubbard <jhubbard@nvidia.com>

Now that all callers of get_user_pages*() have been updated to use
put_user_page(), instead of put_page(), add tracking of such
"gup-pinned" pages. The purpose of this tracking is to answer the
question "has this page been pinned by a call to get_user_pages()?"

In order to answer that, refcounting is required. get_user_pages() and all
its variants increment a reference count, and put_user_page() and its
variants decrement that reference count. If the net count is *effectively*
non-zero (see below), then the page is considered gup-pinned.

What to do in response to encountering such a page, is left to later
patchsets. There is discussion about this in [1], and in an upcoming patch
that adds:

   Documentation/vm/get_user_pages.rst

So, this patch simply adds tracking of such pages.  In order to achieve
this without using up any more bits or fields in struct page, the
page->_refcount field is overloaded.  gup pins are incremented by adding a
large chunk (1024) instead of 1.  This provides a way to say, "either this
page is gup-pinned, or you have a *lot* of references on it, and thus this
is a false positive".  False positives are generally OK, as long as they
are expected to be rare: taking action for a page that looks gup-pinned,
but is not, is not going to be a problem.  It's false negatives (failing
to detect a gup-pinned page) that would be a problem, and those won't
happen with this approach.

This takes advantage of two distinct, pre-existing lock-free algorithms:

a) get_user_pages() and things such as page_mkclean(), both operate on
   page table entries, without taking locks. This relies partly on just
   letting the CPU hardware (which of course also never takes locks to
   use its own page tables) just take page faults if something has changed.

b) page_cache_get_speculative(), called by get_user_pages(), is a way to
   avoid having pages get freed out from under get_user_pages() or other
   things that want to pin pages.

As a result, performance is expected to be unchanged in any noticeable
way, by this patch.

In order to test this, a lot of get_user_pages() call sites have to be
converted over to use put_user_page(), but I did that locally, and here
is an fio run on an NVMe drive, using this for the fio configuration file:

    [reader]
    direct=1
    ioengine=libaio
    blocksize=4096
    size=1g
    numjobs=1
    rw=read
    iodepth=64

reader: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B,
        (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.3
Starting 1 process
Jobs: 1 (f=1)
reader: (groupid=0, jobs=1): err= 0: pid=7011: Sun Feb  3 20:36:51 2019
   read: IOPS=190k, BW=741MiB/s (778MB/s)(1024MiB/1381msec)
    slat (nsec): min=2716, max=57255, avg=4048.14, stdev=1084.10
    clat (usec): min=20, max=12485, avg=332.63, stdev=191.77
     lat (usec): min=22, max=12498, avg=336.72, stdev=192.07
    clat percentiles (usec):
     |  1.00th=[  322],  5.00th=[  322], 10.00th=[  322], 20.00th=[  326],
     | 30.00th=[  326], 40.00th=[  326], 50.00th=[  326], 60.00th=[  326],
     | 70.00th=[  326], 80.00th=[  330], 90.00th=[  330], 95.00th=[  330],
     | 99.00th=[  478], 99.50th=[  717], 99.90th=[ 1074], 99.95th=[ 1090],
     | 99.99th=[12256]
   bw (  KiB/s): min=730152, max=776512, per=99.22%, avg=753332.00,
                 stdev=32781.47, samples=2
   iops        : min=182538, max=194128, avg=188333.00, stdev=8195.37,
                 samples=2
  lat (usec)   : 50=0.01%, 100=0.01%, 250=0.07%, 500=99.26%, 750=0.38%
  lat (usec)   : 1000=0.02%
  lat (msec)   : 2=0.24%, 20=0.02%
  cpu          : usr=15.07%, sys=84.13%, ctx=10, majf=0, minf=74
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
                 >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
                 >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
                 >=64=0.0%
     issued rwts: total=262144,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=741MiB/s (778MB/s), 741MiB/s-741MiB/s (778MB/s-778MB/s),
     io=1024MiB (1074MB), run=1381-1381msec

Disk stats (read/write):
  nvme0n1: ios=216966/0, merge=0/0, ticks=6112/0, in_queue=704, util=91.34%

[1] https://lwn.net/Articles/753027/ "The trouble with get_user_pages()"

Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Jérôme Glisse <jglisse@redhat.com>

Cc: Christian Benvenuti <benve@cisco.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Tom Talpey <tom@talpey.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/mm.h      | 81 +++++++++++++++++++++++++++++------------
 include/linux/pagemap.h |  5 +++
 mm/gup.c                | 60 ++++++++++++++++++++++--------
 mm/swap.c               | 21 +++++++++++
 4 files changed, 128 insertions(+), 39 deletions(-)

Message ID	20190204052135.25784-5-jhubbard@nvidia.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5776213B4 for <patchwork-linux-fsdevel@patchwork.kernel.org>; Mon, 4 Feb 2019 05:22:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 482062B18A for <patchwork-linux-fsdevel@patchwork.kernel.org>; Mon, 4 Feb 2019 05:22:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 31F9D2B180; Mon, 4 Feb 2019 05:22:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 08E362B180 for <patchwork-linux-fsdevel@patchwork.kernel.org>; Mon, 4 Feb 2019 05:22:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726188AbfBDFVs (ORCPT <rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>); Mon, 4 Feb 2019 00:21:48 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:33839 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726083AbfBDFVr (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>); Mon, 4 Feb 2019 00:21:47 -0500 Received: by mail-pf1-f193.google.com with SMTP id h3so6240229pfg.1; Sun, 03 Feb 2019 21:21:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LF/TzZ/WRLL9Myw0MVf2WrKu+LcDjoN1OU7zLg250Lg=; b=erR1C1OBWNS5qrI/OIyE9npEK5/OLRiSj/2mITtoKWiMMnjr4dv6lDAnZLW+LvLtFI tcuWJAu6mGElvmfkzk1vufOemxz4525RTF5QjlLPpUIhRn5xw9jh/g9lRn6zeZgSVNux Vozp1biA9Pblsd95kcDUtCXT7Y1A8+h6VPFL4kSTMrrnqvcyWjatcxeVqyxk03RkGk/Z bX/J1EB03ldEIAaKshALGrWbWq1PIoVg655KMaJ1u3MS9pRPlnfrtoBVpNAdju5CHc3u qLUL/zvq1s4iHAaWHQlk+kKINL+J2ww6BeN8yRHoeE7GO1wJ21kjDbXBCUkvlP0crtqr 3k1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LF/TzZ/WRLL9Myw0MVf2WrKu+LcDjoN1OU7zLg250Lg=; b=lq05zi00pn4og0gDt9TpgKa40g+ia4P2rehlQ0g82gf54iRHKiwwzcMqWR4Ls6pNAO n+dESKVN3GD4xj3n1nYhyW61XpeCm+/ZQGqW06mO8lElPJfIxicELRFOfY68noAGha92 zL8yyT9i5pOLBhSRuAiPeo6lMRCdjlUJSgPK2N6TqCRUKV7Omxy1JVqcp9epPdZL/WXm CJcn14YpfSP1BTx/VdENi+RewoW2GKRLHqVXm8v/cjl74AYjB1EUT3jNPM2sXcDvsskP hr9sTVD+ndL/Umt9thFxyewUUZQBYoQzvdkU0pguUNnGeqetXoiUrBAyTCUsvIfwKPiD N6LA== X-Gm-Message-State: AJcUuke2AldNX6OOlO9dISGTr789rJ7GiMVLZk2MJOYYuHgKp91231BT yIE9rmbU2GP16Knlc64WZho= X-Google-Smtp-Source: ALg8bN44bcAqnBx8wjHxW4Gr5Rk6YMgK+256/BhJrSguem/qNtdvi8Np1H0GzEFHvbP68PIpkX0gxw== X-Received: by 2002:a62:1112:: with SMTP id z18mr49225009pfi.173.1549257706197; Sun, 03 Feb 2019 21:21:46 -0800 (PST) Received: from blueforge.nvidia.com (searspoint.nvidia.com. [216.228.112.21]) by smtp.gmail.com with ESMTPSA id m9sm33428844pgd.32.2019.02.03.21.21.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 03 Feb 2019 21:21:45 -0800 (PST) From: john.hubbard@gmail.com X-Google-Original-From: jhubbard@nvidia.com To: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org Cc: Al Viro <viro@zeniv.linux.org.uk>, Christian Benvenuti <benve@cisco.com>, Christoph Hellwig <hch@infradead.org>, Christopher Lameter <cl@linux.com>, Dan Williams <dan.j.williams@intel.com>, Dave Chinner <david@fromorbit.com>, Dennis Dalessandro <dennis.dalessandro@intel.com>, Doug Ledford <dledford@redhat.com>, Jan Kara <jack@suse.cz>, Jason Gunthorpe <jgg@ziepe.ca>, Jerome Glisse <jglisse@redhat.com>, Matthew Wilcox <willy@infradead.org>, Michal Hocko <mhocko@kernel.org>, Mike Rapoport <rppt@linux.ibm.com>, Mike Marciniszyn <mike.marciniszyn@intel.com>, Ralph Campbell <rcampbell@nvidia.com>, Tom Talpey <tom@talpey.com>, LKML <linux-kernel@vger.kernel.org>, linux-fsdevel@vger.kernel.org, John Hubbard <jhubbard@nvidia.com> Subject: [PATCH 4/6] mm/gup: track gup-pinned pages Date: Sun, 3 Feb 2019 21:21:33 -0800 Message-Id: <20190204052135.25784-5-jhubbard@nvidia.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204052135.25784-1-jhubbard@nvidia.com> References: <20190204052135.25784-1-jhubbard@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-NVConfidentiality: public Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: <linux-fsdevel.vger.kernel.org> X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP
Series	RFC v2: mm: gup/dma tracking \| expand [0/6] RFC v2: mm: gup/dma tracking [1/6] mm: introduce put_user_page(), placeholder versions [2/6] infiniband/mm: convert put_page() to put_user_page() [3/6] mm: page_cache_add_speculative(): refactoring [4/6] mm/gup: track gup-pinned pages [5/6] mm/gup: /proc/vmstat support for get/put user pages [6/6] mm/gup: Documentation/vm/get_user_pages.rst, MAINTAINERS

[4/6] mm/gup: track gup-pinned pages

Commit Message

Comments

Patch