diff mbox

dax: enable DAX PMD suport for NVDIMM device

Message ID A42D2827E934C5DAD59E3A1@lab.ntt.co.jp (mailing list archive)
State New, archived
Headers show

Commit Message

Yoshimi Ichiyanagi Feb. 9, 2017, 2:45 a.m. UTC
Hello.

I use the HPE 8G NVDIMM modules on a HPE DL360G9 server. Currently DAX PMD
(2iMB pages) support is disabled for NVDIMM modules in kernel 4.10.0-rc5. 

PMD DAX would be enabled, if "PFN_DEV and PFN_MAP" of pmem device flags was
set at dax_pmd_insert_mapping().

But "PFN_DEV and PFN_MAP" was not set at pmem_attach_disk() with HPE NVDIMM
modules. Because the pmem_should_map_pages() did not return true at
pmem_attach_disk(). 

pmem_should_map_pages() would return true and DAX PMD would be enabled, 
if ND_REGION_PAGEMAP flag of nd_region flags was set.

In this case, the nd_region was initialized with
acpi_nfit_register_region(), and ND_REGION_PAGEMAP of the nd_region flags
was not set in acpi_nfit_register_region(). So DAX PMD was disabled.

Is it ok to set ND_REGION_PAGEMAP of the PM and VOLATILE type nd_region
flags?

Here is the fio-2.16 script(mmap.fio file) I used for my testing:

[global]
bs=4k
size=2G
directory=/mnt/pmem1
ioengine=mmap
rw=write

I did the following:
# mkfs.ext4 /dev/pmem1
# mount -t ext4 -o dax /dev/pmem1 /mnt/pmem1
# fio mmap.fio

Here are the performance results(ND_REGION_PAGEMAP flag was off):
Run status group 0 (all jobs):
  WRITE: bw=1228MiB/s (1287MB/s), 1228MiB/s-1228MiB/s (1287MB/s-1287MB/s), 
io=2048MiB (2147MB), run=1668-1668msec


Here are the performance results(ND_REGION_PAGEMAP flag was on with 
following patch):
Run status group 0 (all jobs):
  WRITE: bw=3459MiB/s (3628MB/s), 3459MiB/s-3459MiB/s (3628MB/s-3628MB/s), 
io=2048MiB (2147MB), run=592-592msec



                dev_err(acpi_desc->dev, "spa%d dimm: %#x not found\n",
@@ -2116,6 +2116,7 @@ static int acpi_nfit_init_mapping(struct 
acpi_nfit_desc *acpi_desc,
                if (!nfit_mem || !nfit_mem->bdw) {
                        dev_dbg(acpi_desc->dev, "spa%d %s missing bdw\n",
                                        spa->range_index, nvdimm_name
(nvdimm));
+                       blk_valid = 0;
                } else {
                        mapping->size = nfit_mem->bdw->capacity;
                        mapping->start = nfit_mem->bdw->start_address;
@@ -2135,6 +2136,9 @@ static int acpi_nfit_init_mapping(struct 
acpi_nfit_desc *acpi_desc,
                break;
        }

+       if ( blk_valid < 0 )
+               set_bit(ND_REGION_PAGEMAP, &ndr_desc->flags);
+
        return 0;
 }

Comments

Dan Williams Feb. 9, 2017, 3:14 a.m. UTC | #1
You can achieve the same by putting your namespace into "memory" mode
with the following ndctl command.  Note that this will destroy the
current contents of the namespace, and you must unmount /dev/pmem1
before running this.

    ndctl create-namespace --reconfig=namespace1.0 --mode=memory --force

This arranges for struct page / memmap entries to be created for the namespace.

On Wed, Feb 8, 2017 at 6:45 PM, Yoshimi Ichiyanagi
<ichiyanagi.yoshimi@lab.ntt.co.jp> wrote:
> Hello.
>
> I use the HPE 8G NVDIMM modules on a HPE DL360G9 server. Currently DAX PMD
> (2iMB pages) support is disabled for NVDIMM modules in kernel 4.10.0-rc5.
>
> PMD DAX would be enabled, if "PFN_DEV and PFN_MAP" of pmem device flags was
> set at dax_pmd_insert_mapping().
>
> But "PFN_DEV and PFN_MAP" was not set at pmem_attach_disk() with HPE NVDIMM
> modules. Because the pmem_should_map_pages() did not return true at
> pmem_attach_disk().
>
> pmem_should_map_pages() would return true and DAX PMD would be enabled,
> if ND_REGION_PAGEMAP flag of nd_region flags was set.
>
> In this case, the nd_region was initialized with
> acpi_nfit_register_region(), and ND_REGION_PAGEMAP of the nd_region flags
> was not set in acpi_nfit_register_region(). So DAX PMD was disabled.
>
> Is it ok to set ND_REGION_PAGEMAP of the PM and VOLATILE type nd_region
> flags?
>
> Here is the fio-2.16 script(mmap.fio file) I used for my testing:
>
> [global]
> bs=4k
> size=2G
> directory=/mnt/pmem1
> ioengine=mmap
> rw=write
>
> I did the following:
> # mkfs.ext4 /dev/pmem1
> # mount -t ext4 -o dax /dev/pmem1 /mnt/pmem1
> # fio mmap.fio
>
> Here are the performance results(ND_REGION_PAGEMAP flag was off):
> Run status group 0 (all jobs):
>   WRITE: bw=1228MiB/s (1287MB/s), 1228MiB/s-1228MiB/s (1287MB/s-1287MB/s),
> io=2048MiB (2147MB), run=1668-1668msec
>
>
> Here are the performance results(ND_REGION_PAGEMAP flag was on with
> following patch):
> Run status group 0 (all jobs):
>   WRITE: bw=3459MiB/s (3628MB/s), 3459MiB/s-3459MiB/s (3628MB/s-3628MB/s),
> io=2048MiB (2147MB), run=592-592msec
>
>
>
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index 7361d00..1d3bd5a 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -2096,7 +2096,7 @@ static int acpi_nfit_init_mapping(struct
> acpi_nfit_desc *acpi_desc,
>         struct acpi_nfit_system_address *spa = nfit_spa->spa;
>         struct nd_blk_region_desc *ndbr_desc;
>         struct nfit_mem *nfit_mem;
> -       int blk_valid = 0;
> +       int blk_valid = -1;
>
>         if (!nvdimm) {
>                 dev_err(acpi_desc->dev, "spa%d dimm: %#x not found\n",
> @@ -2116,6 +2116,7 @@ static int acpi_nfit_init_mapping(struct
> acpi_nfit_desc *acpi_desc,
>                 if (!nfit_mem || !nfit_mem->bdw) {
>                         dev_dbg(acpi_desc->dev, "spa%d %s missing bdw\n",
>                                         spa->range_index, nvdimm_name
> (nvdimm));
> +                       blk_valid = 0;
>                 } else {
>                         mapping->size = nfit_mem->bdw->capacity;
>                         mapping->start = nfit_mem->bdw->start_address;
> @@ -2135,6 +2136,9 @@ static int acpi_nfit_init_mapping(struct
> acpi_nfit_desc *acpi_desc,
>                 break;
>         }
>
> +       if ( blk_valid < 0 )
> +               set_bit(ND_REGION_PAGEMAP, &ndr_desc->flags);
> +
>         return 0;
>  }
>
>
>
>
Yoshimi Ichiyanagi Feb. 9, 2017, 6:19 a.m. UTC | #2
Thank you, I could enable DAX PMD suport for HPE NVDIMM modules.

The following ndctl command changed the nvdimm device type 
from "namespace_io_device_type" to "nd_pfn_device_type", right?


I did the following: 
------
# ndctl create-namespace --reconfig=namespace1.0 --mode=memory --force
{
  "dev":"namespace1.0",
  "mode":"memory",
  "size":33820770304,
  "uuid":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "blockdev":"pmem1"
}

# ndctl list --namespaces
{
  "provider":"ACPI.NFIT",
  "dev":"ndbus0",
  "namespaces":[
    {
      "dev":"namespace1.0",
      "mode":"memory",
      "size":33820770304,
      "uuid":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
      "blockdev":"pmem1"
    },
    {
      "dev":"namespace0.0",
      "mode":"raw",
      "size":34359738368,
      "blockdev":"pmem0"
    }
  ]
}

# 
------


><dan.j.williams@intel.com> wrote:
>You can achieve the same by putting your namespace into "memory" mode
>with the following ndctl command.  Note that this will destroy the
>current contents of the namespace, and you must unmount /dev/pmem1
>before running this.
>
>    ndctl create-namespace --reconfig=namespace1.0 --mode=memory --force
>
>This arranges for struct page / memmap entries to be created for the 
>namespace.
>
Dan Williams Feb. 9, 2017, 7:05 p.m. UTC | #3
On Wed, Feb 8, 2017 at 10:19 PM, Yoshimi Ichiyanagi
<ichiyanagi.yoshimi@lab.ntt.co.jp> wrote:
> Thank you, I could enable DAX PMD suport for HPE NVDIMM modules.
>
> The following ndctl command changed the nvdimm device type
> from "namespace_io_device_type" to "nd_pfn_device_type", right?

Not quite... the namepsace is still namespace_io_device_type, but we
assign it to a nd_pfn_device_type instance. I.e. something like:

    echo namespace0.0 > /sys/bus/nd/devices/pfn0.0/namespace

Then when we enable that pfn0.0 device it writes metadata to
namespace0.0 indicating that the kernel should allocate memmap entries
for that address range.

> I did the following:
> ------
> # ndctl create-namespace --reconfig=namespace1.0 --mode=memory --force
> {
>   "dev":"namespace1.0",
>   "mode":"memory",
>   "size":33820770304,
>   "uuid":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
>   "blockdev":"pmem1"
> }

Yup, looks good.
diff mbox

Patch

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 7361d00..1d3bd5a 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2096,7 +2096,7 @@  static int acpi_nfit_init_mapping(struct 
acpi_nfit_desc *acpi_desc,
        struct acpi_nfit_system_address *spa = nfit_spa->spa;
        struct nd_blk_region_desc *ndbr_desc;
        struct nfit_mem *nfit_mem;
-       int blk_valid = 0;
+       int blk_valid = -1;

        if (!nvdimm) {