From patchwork Sat Oct 10 00:55:44 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 7364931 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 4EC5BBEEA4 for ; Sat, 10 Oct 2015 01:01:30 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4CADE2060D for ; Sat, 10 Oct 2015 01:01:29 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 48CFE205D1 for ; Sat, 10 Oct 2015 01:01:28 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 3D60061B87; Fri, 9 Oct 2015 18:01:28 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by ml01.01.org (Postfix) with ESMTP id 37EFF61B6A for ; Fri, 9 Oct 2015 18:01:27 -0700 (PDT) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP; 09 Oct 2015 18:01:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,660,1437462000"; d="scan'208";a="661421095" Received: from dwillia2-desk3.jf.intel.com ([10.54.39.39]) by orsmga003.jf.intel.com with ESMTP; 09 Oct 2015 18:01:26 -0700 Subject: [PATCH v2 04/20] mm: introduce __get_dev_pagemap() From: Dan Williams To: linux-nvdimm@lists.01.org Date: Fri, 09 Oct 2015 20:55:44 -0400 Message-ID: <20151010005544.17221.69747.stgit@dwillia2-desk3.jf.intel.com> In-Reply-To: <20151010005522.17221.87557.stgit@dwillia2-desk3.jf.intel.com> References: <20151010005522.17221.87557.stgit@dwillia2-desk3.jf.intel.com> User-Agent: StGit/0.17.1-9-g687f MIME-Version: 1.0 Cc: Dave Chinner , linux-kernel@vger.kernel.org, hch@lst.de, linux-mm@kvack.org, Andrew Morton X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP There are several scenarios where we need to retrieve and update metadata associated with a given devm_memremap_pages() mapping, and the only lookup key available is a pfn in the range: 1/ We want to augment vmemmap_populate() (called via arch_add_memory()) to allocate memmap storage from pre-allocated pages reserved by the device driver. At vmemmap_alloc_block_buf() time it grabs device pages rather than page allocator pages. This is in support of devm_memremap_pages() mappings where the memmap is too large to fit in main memory (i.e. large persistent memory devices). 2/ Taking a reference against the mapping when inserting device pages into the address_space radix of a given inode. This facilitates unmap_mapping_range() and truncate_inode_pages() operations when the driver is tearing down the mapping. 3/ get_user_pages() operations on ZONE_DEVICE memory require taking a reference against the mapping so that the driver teardown path can revoke and drain usage of device pages. Cc: Christoph Hellwig Cc: Dave Chinner Cc: Andrew Morton Cc: Ross Zwisler Signed-off-by: Dan Williams --- include/linux/mm.h | 18 ++++++++++++++++++ kernel/memremap.c | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 80001de019ba..30c3c8764649 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -717,6 +717,24 @@ static inline enum zone_type page_zonenum(const struct page *page) return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; } +/** + * struct dev_pagemap - metadata for ZONE_DEVICE mappings + * @dev: host device of the mapping for debug + */ +struct dev_pagemap { + /* TODO: vmem_altmap and percpu_ref count */ + struct device *dev; +}; + +#ifdef CONFIG_ZONE_DEVICE +struct dev_pagemap *__get_dev_pagemap(resource_size_t phys); +#else +static inline struct dev_pagemap *get_dev_pagemap(resource_size_t phys) +{ + return NULL; +} +#endif + #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define SECTION_IN_PAGE_FLAGS #endif diff --git a/kernel/memremap.c b/kernel/memremap.c index 3218e8b1fc28..64bfd9fa93aa 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -10,6 +10,7 @@ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. */ +#include #include #include #include @@ -138,18 +139,52 @@ void devm_memunmap(struct device *dev, void *addr) EXPORT_SYMBOL(devm_memunmap); #ifdef CONFIG_ZONE_DEVICE +static LIST_HEAD(ranges); +static DEFINE_SPINLOCK(range_lock); + struct page_map { struct resource res; + struct dev_pagemap pgmap; + struct list_head list; }; +static void add_page_map(struct page_map *page_map) +{ + spin_lock(&range_lock); + list_add_rcu(&page_map->list, &ranges); + spin_unlock(&range_lock); +} + +static void del_page_map(struct page_map *page_map) +{ + spin_lock(&range_lock); + list_del_rcu(&page_map->list); + spin_unlock(&range_lock); +} + static void devm_memremap_pages_release(struct device *dev, void *res) { struct page_map *page_map = res; + del_page_map(page_map); + /* pages are dead and unused, undo the arch mapping */ arch_remove_memory(page_map->res.start, resource_size(&page_map->res)); } +/* assumes rcu_read_lock() held at entry */ +struct dev_pagemap *__get_dev_pagemap(resource_size_t phys) +{ + struct page_map *page_map; + + WARN_ON_ONCE(!rcu_read_lock_held()); + + list_for_each_entry_rcu(page_map, &ranges, list) + if (phys >= page_map->res.start && phys <= page_map->res.end) + return &page_map->pgmap; + return NULL; +} + void *devm_memremap_pages(struct device *dev, struct resource *res) { int is_ram = region_intersects(res->start, resource_size(res), @@ -173,12 +208,17 @@ void *devm_memremap_pages(struct device *dev, struct resource *res) memcpy(&page_map->res, res, sizeof(*res)); + page_map->pgmap.dev = dev; + INIT_LIST_HEAD(&page_map->list); + add_page_map(page_map); + nid = dev_to_node(dev); if (nid < 0) nid = numa_mem_id(); error = arch_add_memory(nid, res->start, resource_size(res), true); if (error) { + del_page_map(page_map); devres_free(page_map); return ERR_PTR(error); }