diff mbox

[v2,15/17] libnvdimm: Set numa_node to NVDIMM devices

Message ID 1435257283.13411.4.camel@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dan Williams June 25, 2015, 6:34 p.m. UTC
On Thu, 2015-06-25 at 11:45 -0600, Toshi Kani wrote:
> On Thu, 2015-06-25 at 05:37 -0400, Dan Williams wrote:
> > From: Toshi Kani <toshi.kani@hp.com>
> > 
> > ACPI NFIT table has System Physical Address Range Structure entries that
> > describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
> > set in the flags.
> > 
> > Change acpi_nfit_register_region() to map a proximity ID to its node ID,
> > and set it to a new numa_node field of nd_region_desc, which is then
> > conveyed to the nd_region device.
> > 
> > The device core arranges for btt and namespace devices to inherit their
> > node from their parent region.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > [djbw: move set_dev_node() from region 'probe' to 'create']
> 
> Sorry, I failed to mention other issue, which led me call set_dev_node()
> in probe.  nd_async_device_register() calls device_add(), which does:
> 
>         /* use parent numa_node */
>         if (parent)
>                 set_dev_node(dev, dev_to_node(parent));
> 
> and overwrites numa_node to -1.  Since region's parent is ndbusN, we
> cannot set numa_node to the parent.  So, I had to set it in probe. 

In general, I still don't like leaving it up to ->probe() which is
within its rights to fail and not set the node.  How about the following
that moves it to the bus uevent code?  Should get triggered before probe
so the numa_node is valid before userspace is ever notified about the
device.

device_add() does:

        kobject_uevent(&dev->kobj, KOBJ_ADD);
        bus_probe_device(dev);

...so I think we're good, agree?  I also added a missing init of
ndr_desc.numa_node in arch/x86/kernel/pmem.c, see below.

8<-----
Subject: libnvdimm: Set numa_node to NVDIMM devices

From: Toshi Kani <toshi.kani@hp.com>

ACPI NFIT table has System Physical Address Range Structure entries that
describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
set in the flags.

Change acpi_nfit_register_region() to map a proximity ID to its node ID,
and set it to a new numa_node field of nd_region_desc, which is then
conveyed to the nd_region device.

The device core arranges for btt and namespace devices to inherit their
node from their parent region.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
[djbw: move set_dev_node() from region.c to bus.c]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/kernel/pmem.c       |    1 +
 drivers/acpi/nfit.c          |    6 ++++++
 drivers/nvdimm/bus.c         |    6 ++++++
 drivers/nvdimm/nd.h          |    2 +-
 drivers/nvdimm/region_devs.c |    1 +
 include/linux/libnvdimm.h    |    1 +
 6 files changed, 16 insertions(+), 1 deletion(-)

Comments

Dan Williams June 25, 2015, 9:31 p.m. UTC | #1
On Thu, Jun 25, 2015 at 11:34 AM, Williams, Dan J
<dan.j.williams@intel.com> wrote:
> On Thu, 2015-06-25 at 11:45 -0600, Toshi Kani wrote:
>> On Thu, 2015-06-25 at 05:37 -0400, Dan Williams wrote:
>> > From: Toshi Kani <toshi.kani@hp.com>
>> >
>> > ACPI NFIT table has System Physical Address Range Structure entries that
>> > describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
>> > set in the flags.
>> >
>> > Change acpi_nfit_register_region() to map a proximity ID to its node ID,
>> > and set it to a new numa_node field of nd_region_desc, which is then
>> > conveyed to the nd_region device.
>> >
>> > The device core arranges for btt and namespace devices to inherit their
>> > node from their parent region.
>> >
>> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
>> > [djbw: move set_dev_node() from region 'probe' to 'create']
>>
>> Sorry, I failed to mention other issue, which led me call set_dev_node()
>> in probe.  nd_async_device_register() calls device_add(), which does:
>>
>>         /* use parent numa_node */
>>         if (parent)
>>                 set_dev_node(dev, dev_to_node(parent));
>>
>> and overwrites numa_node to -1.  Since region's parent is ndbusN, we
>> cannot set numa_node to the parent.  So, I had to set it in probe.
>
> In general, I still don't like leaving it up to ->probe() which is
> within its rights to fail and not set the node.  How about the following
> that moves it to the bus uevent code?  Should get triggered before probe
> so the numa_node is valid before userspace is ever notified about the
> device.
>
> device_add() does:
>
>         kobject_uevent(&dev->kobj, KOBJ_ADD);
>         bus_probe_device(dev);
>
> ...so I think we're good, agree?  I also added a missing init of
> ndr_desc.numa_node in arch/x86/kernel/pmem.c, see below.

This looks good in a quick manual test.  It's interesting/illustrative
that I inadvertently broke the one bit of the libnvdimm sysfs
interface that did not have unit test coverage.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshi Kani June 25, 2015, 9:51 p.m. UTC | #2
On Thu, 2015-06-25 at 14:31 -0700, Dan Williams wrote:
> On Thu, Jun 25, 2015 at 11:34 AM, Williams, Dan J
> <dan.j.williams@intel.com> wrote:
> > On Thu, 2015-06-25 at 11:45 -0600, Toshi Kani wrote:
> >> On Thu, 2015-06-25 at 05:37 -0400, Dan Williams wrote:
> >> > From: Toshi Kani <toshi.kani@hp.com>
> >> >
> >> > ACPI NFIT table has System Physical Address Range Structure entries that
> >> > describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
> >> > set in the flags.
> >> >
> >> > Change acpi_nfit_register_region() to map a proximity ID to its node ID,
> >> > and set it to a new numa_node field of nd_region_desc, which is then
> >> > conveyed to the nd_region device.
> >> >
> >> > The device core arranges for btt and namespace devices to inherit their
> >> > node from their parent region.
> >> >
> >> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> >> > [djbw: move set_dev_node() from region 'probe' to 'create']
> >>
> >> Sorry, I failed to mention other issue, which led me call set_dev_node()
> >> in probe.  nd_async_device_register() calls device_add(), which does:
> >>
> >>         /* use parent numa_node */
> >>         if (parent)
> >>                 set_dev_node(dev, dev_to_node(parent));
> >>
> >> and overwrites numa_node to -1.  Since region's parent is ndbusN, we
> >> cannot set numa_node to the parent.  So, I had to set it in probe.
> >
> > In general, I still don't like leaving it up to ->probe() which is
> > within its rights to fail and not set the node.  How about the following
> > that moves it to the bus uevent code?  Should get triggered before probe
> > so the numa_node is valid before userspace is ever notified about the
> > device.
> >
> > device_add() does:
> >
> >         kobject_uevent(&dev->kobj, KOBJ_ADD);
> >         bus_probe_device(dev);
> >
> > ...so I think we're good, agree?  I also added a missing init of
> > ndr_desc.numa_node in arch/x86/kernel/pmem.c, see below.
> 
> This looks good in a quick manual test.  It's interesting/illustrative
> that I inadvertently broke the one bit of the libnvdimm sysfs
> interface that did not have unit test coverage.

Sorry I had some interrupt.  Yes, this works fine for region &
namespace.  I'd like to check with you for btt since the attach logic
has changed in v2.

Previously, as described in patch 16/17, bttN bound to pmem had a valid
numa_node value, and seeding btt0 had -1.

  /sys/bus/nd/devices
  |-- btt0/numa_node:-1
  |-- btt1/numa_node:0

In this version, there are unbound (seeding?) btt0-3 for every region
(there are 4 regions) and btt4 & 5 bound to pmem0 & 3 on my system.

btt0/numa_node:0
btt1/numa_node:0
btt2/numa_node:1
btt3/numa_node:1
btt4/numa_node:0
btt5/numa_node:1

btt0
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt0
btt1
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/btt1
btt2
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region2/btt2
btt3
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt3
btt4
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt4
btt5
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt5

And unbound bttNs attach to different regions across a reboot.

btt0/numa_node:0
btt1/numa_node:1
btt2/numa_node:1
btt3/numa_node:0
btt4/numa_node:0
btt5/numa_node:1

btt0
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt0
btt1
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt1
btt2
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region2/btt2
btt3
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/btt3
btt4
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt4
btt5
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt5

Is this how you'd expect btt to work in this version?  (I have not
looked at the btt changes yet)

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams June 25, 2015, 10 p.m. UTC | #3
On Thu, Jun 25, 2015 at 2:51 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Thu, 2015-06-25 at 14:31 -0700, Dan Williams wrote:
>> On Thu, Jun 25, 2015 at 11:34 AM, Williams, Dan J
>> <dan.j.williams@intel.com> wrote:
>> > On Thu, 2015-06-25 at 11:45 -0600, Toshi Kani wrote:
>> >> On Thu, 2015-06-25 at 05:37 -0400, Dan Williams wrote:
>> >> > From: Toshi Kani <toshi.kani@hp.com>
>> >> >
>> >> > ACPI NFIT table has System Physical Address Range Structure entries that
>> >> > describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
>> >> > set in the flags.
>> >> >
>> >> > Change acpi_nfit_register_region() to map a proximity ID to its node ID,
>> >> > and set it to a new numa_node field of nd_region_desc, which is then
>> >> > conveyed to the nd_region device.
>> >> >
>> >> > The device core arranges for btt and namespace devices to inherit their
>> >> > node from their parent region.
>> >> >
>> >> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
>> >> > [djbw: move set_dev_node() from region 'probe' to 'create']
>> >>
>> >> Sorry, I failed to mention other issue, which led me call set_dev_node()
>> >> in probe.  nd_async_device_register() calls device_add(), which does:
>> >>
>> >>         /* use parent numa_node */
>> >>         if (parent)
>> >>                 set_dev_node(dev, dev_to_node(parent));
>> >>
>> >> and overwrites numa_node to -1.  Since region's parent is ndbusN, we
>> >> cannot set numa_node to the parent.  So, I had to set it in probe.
>> >
>> > In general, I still don't like leaving it up to ->probe() which is
>> > within its rights to fail and not set the node.  How about the following
>> > that moves it to the bus uevent code?  Should get triggered before probe
>> > so the numa_node is valid before userspace is ever notified about the
>> > device.
>> >
>> > device_add() does:
>> >
>> >         kobject_uevent(&dev->kobj, KOBJ_ADD);
>> >         bus_probe_device(dev);
>> >
>> > ...so I think we're good, agree?  I also added a missing init of
>> > ndr_desc.numa_node in arch/x86/kernel/pmem.c, see below.
>>
>> This looks good in a quick manual test.  It's interesting/illustrative
>> that I inadvertently broke the one bit of the libnvdimm sysfs
>> interface that did not have unit test coverage.
>
> Sorry I had some interrupt.  Yes, this works fine for region &
> namespace.  I'd like to check with you for btt since the attach logic
> has changed in v2.
>
> Previously, as described in patch 16/17, bttN bound to pmem had a valid
> numa_node value, and seeding btt0 had -1.
>
>   /sys/bus/nd/devices
>   |-- btt0/numa_node:-1
>   |-- btt1/numa_node:0
>
> In this version, there are unbound (seeding?) btt0-3 for every region
> (there are 4 regions) and btt4 & 5 bound to pmem0 & 3 on my system.
>
> btt0/numa_node:0
> btt1/numa_node:0
> btt2/numa_node:1
> btt3/numa_node:1
> btt4/numa_node:0
> btt5/numa_node:1
>
> btt0
> -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt0
> btt1
> -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/btt1
> btt2
> -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region2/btt2
> btt3
> -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt3
> btt4
> -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt4
> btt5
> -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt5
>
> And unbound bttNs attach to different regions across a reboot.
>
> btt0/numa_node:0
> btt1/numa_node:1
> btt2/numa_node:1
> btt3/numa_node:0
> btt4/numa_node:0
> btt5/numa_node:1
>
> btt0
> -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt0
> btt1
> -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt1
> btt2
> -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region2/btt2
> btt3
> -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/btt3
> btt4
> -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt4
> btt5
> -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt5
>
> Is this how you'd expect btt to work in this version?  (I have not
> looked at the btt changes yet)

Yes, this looks fine.

As requested by Christoph, in the latest version BTTs are child
devices of regions rather than busses.  They automatically inherit the
numa_node of the parent region.  In your dump above the numa_nodes are
not changing from boot-to-boot, instead the BTTs are registered
asynchronously so get different ids from boot-to-boot.  Userspace
should not care what the btt id is and the same naming trick we use to
give block devices static names would not work for BTTs.  The child
block device of the BTT will still have the static name as we
discussed earlier (/dev/pmemXs or /dev/ndblkX.Ys) because the scan
order of those is deterministic.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshi Kani June 25, 2015, 10:11 p.m. UTC | #4
On Thu, 2015-06-25 at 15:00 -0700, Dan Williams wrote:
> On Thu, Jun 25, 2015 at 2:51 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> > On Thu, 2015-06-25 at 14:31 -0700, Dan Williams wrote:
> >> On Thu, Jun 25, 2015 at 11:34 AM, Williams, Dan J
> >> <dan.j.williams@intel.com> wrote:
> >> > On Thu, 2015-06-25 at 11:45 -0600, Toshi Kani wrote:
> >> >> On Thu, 2015-06-25 at 05:37 -0400, Dan Williams wrote:
> >> >> > From: Toshi Kani <toshi.kani@hp.com>
> >> >> >
> >> >> > ACPI NFIT table has System Physical Address Range Structure entries that
> >> >> > describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
> >> >> > set in the flags.
> >> >> >
> >> >> > Change acpi_nfit_register_region() to map a proximity ID to its node ID,
> >> >> > and set it to a new numa_node field of nd_region_desc, which is then
> >> >> > conveyed to the nd_region device.
> >> >> >
> >> >> > The device core arranges for btt and namespace devices to inherit their
> >> >> > node from their parent region.
> >> >> >
> >> >> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> >> >> > [djbw: move set_dev_node() from region 'probe' to 'create']
> >> >>
> >> >> Sorry, I failed to mention other issue, which led me call set_dev_node()
> >> >> in probe.  nd_async_device_register() calls device_add(), which does:
> >> >>
> >> >>         /* use parent numa_node */
> >> >>         if (parent)
> >> >>                 set_dev_node(dev, dev_to_node(parent));
> >> >>
> >> >> and overwrites numa_node to -1.  Since region's parent is ndbusN, we
> >> >> cannot set numa_node to the parent.  So, I had to set it in probe.
> >> >
> >> > In general, I still don't like leaving it up to ->probe() which is
> >> > within its rights to fail and not set the node.  How about the following
> >> > that moves it to the bus uevent code?  Should get triggered before probe
> >> > so the numa_node is valid before userspace is ever notified about the
> >> > device.
> >> >
> >> > device_add() does:
> >> >
> >> >         kobject_uevent(&dev->kobj, KOBJ_ADD);
> >> >         bus_probe_device(dev);
> >> >
> >> > ...so I think we're good, agree?  I also added a missing init of
> >> > ndr_desc.numa_node in arch/x86/kernel/pmem.c, see below.
> >>
> >> This looks good in a quick manual test.  It's interesting/illustrative
> >> that I inadvertently broke the one bit of the libnvdimm sysfs
> >> interface that did not have unit test coverage.
> >
> > Sorry I had some interrupt.  Yes, this works fine for region &
> > namespace.  I'd like to check with you for btt since the attach logic
> > has changed in v2.
> >
> > Previously, as described in patch 16/17, bttN bound to pmem had a valid
> > numa_node value, and seeding btt0 had -1.
> >
> >   /sys/bus/nd/devices
> >   |-- btt0/numa_node:-1
> >   |-- btt1/numa_node:0
> >
> > In this version, there are unbound (seeding?) btt0-3 for every region
> > (there are 4 regions) and btt4 & 5 bound to pmem0 & 3 on my system.
> >
> > btt0/numa_node:0
> > btt1/numa_node:0
> > btt2/numa_node:1
> > btt3/numa_node:1
> > btt4/numa_node:0
> > btt5/numa_node:1
> >
> > btt0
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt0
> > btt1
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/btt1
> > btt2
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region2/btt2
> > btt3
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt3
> > btt4
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt4
> > btt5
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt5
> >
> > And unbound bttNs attach to different regions across a reboot.
> >
> > btt0/numa_node:0
> > btt1/numa_node:1
> > btt2/numa_node:1
> > btt3/numa_node:0
> > btt4/numa_node:0
> > btt5/numa_node:1
> >
> > btt0
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt0
> > btt1
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt1
> > btt2
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region2/btt2
> > btt3
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/btt3
> > btt4
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt4
> > btt5
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt5
> >
> > Is this how you'd expect btt to work in this version?  (I have not
> > looked at the btt changes yet)
> 
> Yes, this looks fine.
> 
> As requested by Christoph, in the latest version BTTs are child
> devices of regions rather than busses.  They automatically inherit the
> numa_node of the parent region.  In your dump above the numa_nodes are
> not changing from boot-to-boot, instead the BTTs are registered
> asynchronously so get different ids from boot-to-boot.  Userspace
> should not care what the btt id is and the same naming trick we use to
> give block devices static names would not work for BTTs.  The child
> block device of the BTT will still have the static name as we
> discussed earlier (/dev/pmemXs or /dev/ndblkX.Ys) because the scan
> order of those is deterministic.

Yes, I see no problem with bound BTTs and their device files.  So, how
do we bind BTT with this new version?

Thanks,
-Toshi  



--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams June 25, 2015, 10:34 p.m. UTC | #5
On Thu, Jun 25, 2015 at 3:11 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Thu, 2015-06-25 at 15:00 -0700, Dan Williams wrote:
> Yes, I see no problem with bound BTTs and their device files.  So, how
> do we bind BTT with this new version?
>

# cd /sys/bus/nd/devices
# uuidgen > btt6/uuid
# echo 4096 > btt6/sector_size
# echo namespace6.0 > btt6/namespace
# echo namespace6.0 > ../drivers/nd_pmem/unbind
# echo btt6 > ../drivers/nd_pmem/bind

After reboot, when the system sees namespace6.0 again it will notice
the btt instance and attach bttX instead.  The net effect is that now
you'll only ever have /dev/pmem6 or /dev/pmem6s, never both at the
same time that was a side effect of the stacking approach.

I'll post the patch that updates libndctl and the unit tests shortly
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshi Kani June 25, 2015, 10:55 p.m. UTC | #6
On Thu, 2015-06-25 at 15:34 -0700, Dan Williams wrote:
> On Thu, Jun 25, 2015 at 3:11 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> > On Thu, 2015-06-25 at 15:00 -0700, Dan Williams wrote:
> > Yes, I see no problem with bound BTTs and their device files.  So, how
> > do we bind BTT with this new version?
> >
> 
> # cd /sys/bus/nd/devices
> # uuidgen > btt6/uuid
> # echo 4096 > btt6/sector_size
> # echo namespace6.0 > btt6/namespace
> # echo namespace6.0 > ../drivers/nd_pmem/unbind
> # echo btt6 > ../drivers/nd_pmem/bind
> 
> After reboot, when the system sees namespace6.0 again it will notice
> the btt instance and attach bttX instead.  The net effect is that now
> you'll only ever have /dev/pmem6 or /dev/pmem6s, never both at the
> same time that was a side effect of the stacking approach.
> 
> I'll post the patch that updates libndctl and the unit tests shortly

Maybe I am missing something, but I am getting errors on my system.  (I
used btt0 since there is no btt6.)

# cat bind.sh
set -x
cd /sys/bus/nd/devices
uuidgen > btt0/uuid
echo 4096 > btt0/sector_size
echo namespace0.0 > btt0/namespace
echo namespace0.0 > ../drivers/nd_pmem/unbind
echo btt0 > ../drivers/nd_pmem/bind

# sh bind.sh
+ cd /sys/bus/nd/devices
+ uuidgen
+ echo 4096
+ echo namespace0.0
bind.sh: line 6: echo: write error: Device or resource busy
+ echo namespace0.0
bind.sh: line 7: echo: write error: No such device
+ echo btt0
bind.sh: line 8: echo: write error: No such device

# dmesg
 :
[12513.839162] nd btt0: uuid_store: result: 0 wrote:
b32cd195-9aae-4c54-a5ac-49adb50a8a98
[12513.880286] nd btt0: sector_size_store: result: 0 wrote: 4096
[12513.909494] nd btt0: namespace0.0 already claimed
[12513.933364] nd btt0: namespace_store: result: -16 wrote: namespace0.0
[12513.966808]  ndbus0: nd_pmem.probe(btt0) = -19

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c
index 0f4ef472ab9e..64f90f53bb85 100644
--- a/arch/x86/kernel/pmem.c
+++ b/arch/x86/kernel/pmem.c
@@ -67,6 +67,7 @@  static __init int register_e820_pmem(void)
 		memset(&ndr_desc, 0, sizeof(ndr_desc));
 		ndr_desc.res = &res;
 		ndr_desc.attr_groups = e820_pmem_region_attribute_groups;
+		ndr_desc.numa_node = NUMA_NO_NODE;
 		if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))
 			goto err;
 	}
diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 1f6f1b1a54f4..d96c8fe974dd 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1392,6 +1392,12 @@  static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
 	ndr_desc->res = &res;
 	ndr_desc->provider_data = nfit_spa;
 	ndr_desc->attr_groups = acpi_nfit_region_attribute_groups;
+	if (spa->flags & ACPI_NFIT_PROXIMITY_VALID)
+		ndr_desc->numa_node = acpi_map_pxm_to_online_node(
+						spa->proximity_domain);
+	else
+		ndr_desc->numa_node = NUMA_NO_NODE;
+
 	list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
 		struct acpi_nfit_memory_map *memdev = nfit_memdev->memdev;
 		struct nd_mapping *nd_mapping;
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index ec59f1f26d95..205344643852 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -48,6 +48,12 @@  static int to_nd_device_type(struct device *dev)
 
 static int nvdimm_bus_uevent(struct device *dev, struct kobj_uevent_env *env)
 {
+	/*
+	 * Ensure that region devices always have their numa node set as
+	 * early as possible.
+	 */
+	if (is_nd_pmem(dev) || is_nd_blk(dev))
+		set_dev_node(dev, to_nd_region(dev)->numa_node);
 	return add_uevent_var(env, "MODALIAS=" ND_DEVICE_MODALIAS_FMT,
 			to_nd_device_type(dev));
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index b870de9add79..72c26461835d 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -96,7 +96,7 @@  struct nd_region {
 	u16 ndr_mappings;
 	u64 ndr_size;
 	u64 ndr_start;
-	int id, num_lanes, ro;
+	int id, num_lanes, ro, numa_node;
 	void *provider_data;
 	struct nd_interleave_set *nd_set;
 	struct nd_percpu_lane __percpu *lane;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 8f8c7ea485f1..55b424f6ba0d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -736,6 +736,7 @@  static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 	nd_region->nd_set = ndr_desc->nd_set;
 	nd_region->num_lanes = ndr_desc->num_lanes;
 	nd_region->ro = ro;
+	nd_region->numa_node = ndr_desc->numa_node;
 	ida_init(&nd_region->ns_ida);
 	dev = &nd_region->dev;
 	dev_set_name(dev, "region%d", nd_region->id);
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index dc799a29ed1a..30b3deaafd51 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -89,6 +89,7 @@  struct nd_region_desc {
 	struct nd_interleave_set *nd_set;
 	void *provider_data;
 	int num_lanes;
+	int numa_node;
 };
 
 struct nvdimm_bus;