From patchwork Thu Jun 25 18:34:47 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 6677161 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 822BDC05AC for ; Thu, 25 Jun 2015 18:35:13 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D0F5F2071B for ; Thu, 25 Jun 2015 18:35:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0F12620718 for ; Thu, 25 Jun 2015 18:35:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751267AbbFYSey (ORCPT ); Thu, 25 Jun 2015 14:34:54 -0400 Received: from mga03.intel.com ([134.134.136.65]:36297 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751136AbbFYSew (ORCPT ); Thu, 25 Jun 2015 14:34:52 -0400 Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP; 25 Jun 2015 11:34:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,678,1427785200"; d="scan'208";a="734699599" Received: from orsmsx103.amr.corp.intel.com ([10.22.225.130]) by fmsmga001.fm.intel.com with ESMTP; 25 Jun 2015 11:34:48 -0700 Received: from orsmsx153.amr.corp.intel.com (10.22.226.247) by ORSMSX103.amr.corp.intel.com (10.22.225.130) with Microsoft SMTP Server (TLS) id 14.3.224.2; Thu, 25 Jun 2015 11:34:47 -0700 Received: from orsmsx107.amr.corp.intel.com ([169.254.1.203]) by ORSMSX153.amr.corp.intel.com ([169.254.12.177]) with mapi id 14.03.0224.002; Thu, 25 Jun 2015 11:34:47 -0700 From: "Williams, Dan J" To: "toshi.kani@hp.com" CC: "linux-kernel@vger.kernel.org" , "mingo@kernel.org" , "hch@lst.de" , "axboe@kernel.dk" , "linux-nvdimm@lists.01.org" , "linux-fsdevel@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "boaz@plexistor.com" Subject: Re: [PATCH v2 15/17] libnvdimm: Set numa_node to NVDIMM devices Thread-Topic: [PATCH v2 15/17] libnvdimm: Set numa_node to NVDIMM devices Thread-Index: AQHQrytfA9nrfqQTNkWmdEpHhpYGKJ299GeAgAANz4A= Date: Thu, 25 Jun 2015 18:34:47 +0000 Message-ID: <1435257283.13411.4.camel@intel.com> References: <20150625090554.40066.69562.stgit@dwillia2-desk3.jf.intel.com> <20150625093738.40066.88750.stgit@dwillia2-desk3.jf.intel.com> <1435254317.11808.327.camel@misato.fc.hp.com> In-Reply-To: <1435254317.11808.327.camel@misato.fc.hp.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.22.254.140] Content-ID: <99B108357E832F49AAAADC54323A9624@intel.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, 2015-06-25 at 11:45 -0600, Toshi Kani wrote: > On Thu, 2015-06-25 at 05:37 -0400, Dan Williams wrote: > > From: Toshi Kani > > > > ACPI NFIT table has System Physical Address Range Structure entries that > > describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is > > set in the flags. > > > > Change acpi_nfit_register_region() to map a proximity ID to its node ID, > > and set it to a new numa_node field of nd_region_desc, which is then > > conveyed to the nd_region device. > > > > The device core arranges for btt and namespace devices to inherit their > > node from their parent region. > > > > Signed-off-by: Toshi Kani > > [djbw: move set_dev_node() from region 'probe' to 'create'] > > Sorry, I failed to mention other issue, which led me call set_dev_node() > in probe. nd_async_device_register() calls device_add(), which does: > > /* use parent numa_node */ > if (parent) > set_dev_node(dev, dev_to_node(parent)); > > and overwrites numa_node to -1. Since region's parent is ndbusN, we > cannot set numa_node to the parent. So, I had to set it in probe. In general, I still don't like leaving it up to ->probe() which is within its rights to fail and not set the node. How about the following that moves it to the bus uevent code? Should get triggered before probe so the numa_node is valid before userspace is ever notified about the device. device_add() does: kobject_uevent(&dev->kobj, KOBJ_ADD); bus_probe_device(dev); ...so I think we're good, agree? I also added a missing init of ndr_desc.numa_node in arch/x86/kernel/pmem.c, see below. 8<----- Subject: libnvdimm: Set numa_node to NVDIMM devices From: Toshi Kani ACPI NFIT table has System Physical Address Range Structure entries that describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is set in the flags. Change acpi_nfit_register_region() to map a proximity ID to its node ID, and set it to a new numa_node field of nd_region_desc, which is then conveyed to the nd_region device. The device core arranges for btt and namespace devices to inherit their node from their parent region. Signed-off-by: Toshi Kani [djbw: move set_dev_node() from region.c to bus.c] Signed-off-by: Dan Williams --- arch/x86/kernel/pmem.c | 1 + drivers/acpi/nfit.c | 6 ++++++ drivers/nvdimm/bus.c | 6 ++++++ drivers/nvdimm/nd.h | 2 +- drivers/nvdimm/region_devs.c | 1 + include/linux/libnvdimm.h | 1 + 6 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c index 0f4ef472ab9e..64f90f53bb85 100644 --- a/arch/x86/kernel/pmem.c +++ b/arch/x86/kernel/pmem.c @@ -67,6 +67,7 @@ static __init int register_e820_pmem(void) memset(&ndr_desc, 0, sizeof(ndr_desc)); ndr_desc.res = &res; ndr_desc.attr_groups = e820_pmem_region_attribute_groups; + ndr_desc.numa_node = NUMA_NO_NODE; if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc)) goto err; } diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c index 1f6f1b1a54f4..d96c8fe974dd 100644 --- a/drivers/acpi/nfit.c +++ b/drivers/acpi/nfit.c @@ -1392,6 +1392,12 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc, ndr_desc->res = &res; ndr_desc->provider_data = nfit_spa; ndr_desc->attr_groups = acpi_nfit_region_attribute_groups; + if (spa->flags & ACPI_NFIT_PROXIMITY_VALID) + ndr_desc->numa_node = acpi_map_pxm_to_online_node( + spa->proximity_domain); + else + ndr_desc->numa_node = NUMA_NO_NODE; + list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) { struct acpi_nfit_memory_map *memdev = nfit_memdev->memdev; struct nd_mapping *nd_mapping; diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index ec59f1f26d95..205344643852 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -48,6 +48,12 @@ static int to_nd_device_type(struct device *dev) static int nvdimm_bus_uevent(struct device *dev, struct kobj_uevent_env *env) { + /* + * Ensure that region devices always have their numa node set as + * early as possible. + */ + if (is_nd_pmem(dev) || is_nd_blk(dev)) + set_dev_node(dev, to_nd_region(dev)->numa_node); return add_uevent_var(env, "MODALIAS=" ND_DEVICE_MODALIAS_FMT, to_nd_device_type(dev)); } diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h index b870de9add79..72c26461835d 100644 --- a/drivers/nvdimm/nd.h +++ b/drivers/nvdimm/nd.h @@ -96,7 +96,7 @@ struct nd_region { u16 ndr_mappings; u64 ndr_size; u64 ndr_start; - int id, num_lanes, ro; + int id, num_lanes, ro, numa_node; void *provider_data; struct nd_interleave_set *nd_set; struct nd_percpu_lane __percpu *lane; diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index 8f8c7ea485f1..55b424f6ba0d 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -736,6 +736,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, nd_region->nd_set = ndr_desc->nd_set; nd_region->num_lanes = ndr_desc->num_lanes; nd_region->ro = ro; + nd_region->numa_node = ndr_desc->numa_node; ida_init(&nd_region->ns_ida); dev = &nd_region->dev; dev_set_name(dev, "region%d", nd_region->id); diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h index dc799a29ed1a..30b3deaafd51 100644 --- a/include/linux/libnvdimm.h +++ b/include/linux/libnvdimm.h @@ -89,6 +89,7 @@ struct nd_region_desc { struct nd_interleave_set *nd_set; void *provider_data; int num_lanes; + int numa_node; }; struct nvdimm_bus;