diff mbox series

[v4,10/45] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE

Message ID 20220701142310.2188015-11-glider@google.com (mailing list archive)
State New
Headers show
Series Add KernelMemorySanitizer infrastructure | expand

Commit Message

Alexander Potapenko July 1, 2022, 2:22 p.m. UTC
KMSAN adds extra metadata fields to struct page, so it does not fit into
64 bytes anymore.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
Link: https://linux-review.googlesource.com/id/I353796acc6a850bfd7bb342aa1b63e616fc614f1
---
 drivers/nvdimm/nd.h       | 2 +-
 drivers/nvdimm/pfn_devs.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Marco Elver July 11, 2022, 4:26 p.m. UTC | #1
On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
>
> KMSAN adds extra metadata fields to struct page, so it does not fit into
> 64 bytes anymore.

Does this somehow cause extra space being used in all kernel configs?
If not, it would be good to note this in the commit message.


> Signed-off-by: Alexander Potapenko <glider@google.com>

Reviewed-by: Marco Elver <elver@google.com>

> ---
> Link: https://linux-review.googlesource.com/id/I353796acc6a850bfd7bb342aa1b63e616fc614f1
> ---
>  drivers/nvdimm/nd.h       | 2 +-
>  drivers/nvdimm/pfn_devs.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> index ec5219680092d..85ca5b4da3cf3 100644
> --- a/drivers/nvdimm/nd.h
> +++ b/drivers/nvdimm/nd.h
> @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
>                 struct nd_namespace_common *ndns);
>  #if IS_ENABLED(CONFIG_ND_CLAIM)
>  /* max struct page size independent of kernel config */
> -#define MAX_STRUCT_PAGE_SIZE 64
> +#define MAX_STRUCT_PAGE_SIZE 128
>  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
>  #else
>  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index 0e92ab4b32833..61af072ac98f9 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -787,7 +787,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>                  * when populating the vmemmap. This *should* be equal to
>                  * PMD_SIZE for most architectures.
>                  *
> -                * Also make sure size of struct page is less than 64. We
> +                * Also make sure size of struct page is less than 128. We
>                  * want to make sure we use large enough size here so that
>                  * we don't have a dynamic reserve space depending on
>                  * struct page size. But we also want to make sure we notice
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>
Alexander Potapenko Aug. 3, 2022, 9:41 a.m. UTC | #2
(+ Dan Williams)

On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
>
> On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> >
> > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > 64 bytes anymore.
>
> Does this somehow cause extra space being used in all kernel configs?
> If not, it would be good to note this in the commit message.

I actually couldn't verify this on QEMU, because the driver never got loaded.
Looks like this increases the amount of memory used by the nvdimm
driver in all kernel configs that enable it (including those that
don't use KMSAN), but I am not sure how much is that.

Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?

>
> > Signed-off-by: Alexander Potapenko <glider@google.com>
>
> Reviewed-by: Marco Elver <elver@google.com>
Alexander Potapenko Aug. 3, 2022, 9:44 a.m. UTC | #3
(+ Dan Williams)
(resending with patch context included)

On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
>
> On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> >
> > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > 64 bytes anymore.
>
> Does this somehow cause extra space being used in all kernel configs?
> If not, it would be good to note this in the commit message.
>
I actually couldn't verify this on QEMU, because the driver never got loaded.
Looks like this increases the amount of memory used by the nvdimm
driver in all kernel configs that enable it (including those that
don't use KMSAN), but I am not sure how much is that.

Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?

>
> > Signed-off-by: Alexander Potapenko <glider@google.com>
>
> Reviewed-by: Marco Elver <elver@google.com>
>
> > ---
> > Link: https://linux-review.googlesource.com/id/I353796acc6a850bfd7bb342aa1b63e616fc614f1
> > ---
> >  drivers/nvdimm/nd.h       | 2 +-
> >  drivers/nvdimm/pfn_devs.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> > index ec5219680092d..85ca5b4da3cf3 100644
> > --- a/drivers/nvdimm/nd.h
> > +++ b/drivers/nvdimm/nd.h
> > @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
> >                 struct nd_namespace_common *ndns);
> >  #if IS_ENABLED(CONFIG_ND_CLAIM)
> >  /* max struct page size independent of kernel config */
> > -#define MAX_STRUCT_PAGE_SIZE 64
> > +#define MAX_STRUCT_PAGE_SIZE 128
> >  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
> >  #else
> >  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> > index 0e92ab4b32833..61af072ac98f9 100644
> > --- a/drivers/nvdimm/pfn_devs.c
> > +++ b/drivers/nvdimm/pfn_devs.c
> > @@ -787,7 +787,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
> >                  * when populating the vmemmap. This *should* be equal to
> >                  * PMD_SIZE for most architectures.
> >                  *
> > -                * Also make sure size of struct page is less than 64. We
> > +                * Also make sure size of struct page is less than 128. We
> >                  * want to make sure we use large enough size here so that
> >                  * we don't have a dynamic reserve space depending on
> >                  * struct page size. But we also want to make sure we notice
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >
Dan Williams Jan. 5, 2023, 10:08 p.m. UTC | #4
Alexander Potapenko wrote:
> (+ Dan Williams)
> (resending with patch context included)
> 
> On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> >
> > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > >
> > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > 64 bytes anymore.
> >
> > Does this somehow cause extra space being used in all kernel configs?
> > If not, it would be good to note this in the commit message.
> >
> I actually couldn't verify this on QEMU, because the driver never got loaded.
> Looks like this increases the amount of memory used by the nvdimm
> driver in all kernel configs that enable it (including those that
> don't use KMSAN), but I am not sure how much is that.
> 
> Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?

Apologies I missed this several months ago. The answer is that this
causes everyone creating PMEM namespaces on v6.1+ to lose double the
capacity of their namespace even when not using KMSAN which is too
wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
something like:

diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 79d93126453d..5693869b720b 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -63,6 +63,7 @@ config NVDIMM_PFN
        bool "PFN: Map persistent (device) memory"
        default LIBNVDIMM
        depends on ZONE_DEVICE
+       depends on !KMSAN
        select ND_CLAIM
        help
          Map persistent memory, i.e. advertise it to the memory


...otherwise, what was the rationale for increasing this value? Were you
actually trying to use KMSAN for DAX pages?
Alexander Potapenko Jan. 9, 2023, 9:51 a.m. UTC | #5
On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Alexander Potapenko wrote:
> > (+ Dan Williams)
> > (resending with patch context included)
> >
> > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > >
> > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > >
> > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > 64 bytes anymore.
> > >
> > > Does this somehow cause extra space being used in all kernel configs?
> > > If not, it would be good to note this in the commit message.
> > >
> > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > Looks like this increases the amount of memory used by the nvdimm
> > driver in all kernel configs that enable it (including those that
> > don't use KMSAN), but I am not sure how much is that.
> >
> > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
>
> Apologies I missed this several months ago. The answer is that this
> causes everyone creating PMEM namespaces on v6.1+ to lose double the
> capacity of their namespace even when not using KMSAN which is too
> wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> something like:
>
> diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> index 79d93126453d..5693869b720b 100644
> --- a/drivers/nvdimm/Kconfig
> +++ b/drivers/nvdimm/Kconfig
> @@ -63,6 +63,7 @@ config NVDIMM_PFN
>         bool "PFN: Map persistent (device) memory"
>         default LIBNVDIMM
>         depends on ZONE_DEVICE
> +       depends on !KMSAN
>         select ND_CLAIM
>         help
>           Map persistent memory, i.e. advertise it to the memory
>
>
> ...otherwise, what was the rationale for increasing this value? Were you
> actually trying to use KMSAN for DAX pages?

I was just building the kernel with nvdimm driver and KMSAN enabled.
Because KMSAN adds extra data to every struct page, it immediately hit
the following assert:

drivers/nvdimm/pfn_devs.c:796:3: error: call to
__compiletime_assert_330 declared with 'error' attribute: BUILD_BUG_ON
fE
                BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);

The comment before MAX_STRUCT_PAGE_SIZE declaration says "max struct
page size independent of kernel config", but maybe we can afford
making it dependent on CONFIG_KMSAN (and possibly other config options
that increase struct page size)?

I don't mind disabling the driver under KMSAN, but having an extra
ifdef to keep KMSAN support sounds reasonable, WDYT?
Dan Williams Jan. 9, 2023, 10:06 p.m. UTC | #6
Alexander Potapenko wrote:
> On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > Alexander Potapenko wrote:
> > > (+ Dan Williams)
> > > (resending with patch context included)
> > >
> > > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > > >
> > > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > > >
> > > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > > 64 bytes anymore.
> > > >
> > > > Does this somehow cause extra space being used in all kernel configs?
> > > > If not, it would be good to note this in the commit message.
> > > >
> > > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > > Looks like this increases the amount of memory used by the nvdimm
> > > driver in all kernel configs that enable it (including those that
> > > don't use KMSAN), but I am not sure how much is that.
> > >
> > > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
> >
> > Apologies I missed this several months ago. The answer is that this
> > causes everyone creating PMEM namespaces on v6.1+ to lose double the
> > capacity of their namespace even when not using KMSAN which is too
> > wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> > increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> > something like:
> >
> > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > index 79d93126453d..5693869b720b 100644
> > --- a/drivers/nvdimm/Kconfig
> > +++ b/drivers/nvdimm/Kconfig
> > @@ -63,6 +63,7 @@ config NVDIMM_PFN
> >         bool "PFN: Map persistent (device) memory"
> >         default LIBNVDIMM
> >         depends on ZONE_DEVICE
> > +       depends on !KMSAN
> >         select ND_CLAIM
> >         help
> >           Map persistent memory, i.e. advertise it to the memory
> >
> >
> > ...otherwise, what was the rationale for increasing this value? Were you
> > actually trying to use KMSAN for DAX pages?
> 
> I was just building the kernel with nvdimm driver and KMSAN enabled.
> Because KMSAN adds extra data to every struct page, it immediately hit
> the following assert:
> 
> drivers/nvdimm/pfn_devs.c:796:3: error: call to
> __compiletime_assert_330 declared with 'error' attribute: BUILD_BUG_ON
> fE
>                 BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);
> 
> The comment before MAX_STRUCT_PAGE_SIZE declaration says "max struct
> page size independent of kernel config", but maybe we can afford
> making it dependent on CONFIG_KMSAN (and possibly other config options
> that increase struct page size)?
> 
> I don't mind disabling the driver under KMSAN, but having an extra
> ifdef to keep KMSAN support sounds reasonable, WDYT?

How about a module parameter to opt-in to the increased permanent
capacity loss?

-- >8 --
From 693563817dea3fd8f293f9b69ec78066ab1d96d2 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams@intel.com>
Date: Thu, 5 Jan 2023 13:27:34 -0800
Subject: [PATCH] nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE

Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")

...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
doubles the amount of capacity stolen from user addressable capacity for
everyone, regardless of whether they are using the debug option. Revert
that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
allow for debug scenarios to proceed with creating debug sized page maps
with a new 'libnvdimm.page_struct_override' module parameter.

Note that this only applies to cases where the page map is permanent,
i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
create-namespace" terms). For the "--map=mem" case, since the allocation
is ephemeral for the lifespan of the namespace, there are no explicit
restriction. However, the implicit restriction, of having enough
available "System RAM" to store the page map for the typically large
pmem, still applies.

Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
Cc: <stable@vger.kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
---
 drivers/nvdimm/nd.h       |  2 +-
 drivers/nvdimm/pfn_devs.c | 45 ++++++++++++++++++++++++++-------------
 2 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 85ca5b4da3cf..ec5219680092 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
 		struct nd_namespace_common *ndns);
 #if IS_ENABLED(CONFIG_ND_CLAIM)
 /* max struct page size independent of kernel config */
-#define MAX_STRUCT_PAGE_SIZE 128
+#define MAX_STRUCT_PAGE_SIZE 64
 int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
 #else
 static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 61af072ac98f..978d63559c0e 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -13,6 +13,11 @@
 #include "pfn.h"
 #include "nd.h"
 
+static bool page_struct_override;
+module_param(page_struct_override, bool, 0644);
+MODULE_PARM_DESC(page_struct_override,
+		 "Force namespace creation in the presence of mm-debug.");
+
 static void nd_pfn_release(struct device *dev)
 {
 	struct nd_region *nd_region = to_nd_region(dev->parent);
@@ -758,12 +763,6 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 		return -ENXIO;
 	}
 
-	/*
-	 * Note, we use 64 here for the standard size of struct page,
-	 * debugging options may cause it to be larger in which case the
-	 * implementation will limit the pfns advertised through
-	 * ->direct_access() to those that are included in the memmap.
-	 */
 	start = nsio->res.start;
 	size = resource_size(&nsio->res);
 	npfns = PHYS_PFN(size - SZ_8K);
@@ -782,20 +781,33 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	}
 	end_trunc = start + size - ALIGN_DOWN(start + size, align);
 	if (nd_pfn->mode == PFN_MODE_PMEM) {
+		unsigned long page_map_size = MAX_STRUCT_PAGE_SIZE * npfns;
+
 		/*
 		 * The altmap should be padded out to the block size used
 		 * when populating the vmemmap. This *should* be equal to
 		 * PMD_SIZE for most architectures.
 		 *
-		 * Also make sure size of struct page is less than 128. We
-		 * want to make sure we use large enough size here so that
-		 * we don't have a dynamic reserve space depending on
-		 * struct page size. But we also want to make sure we notice
-		 * when we end up adding new elements to struct page.
+		 * Also make sure size of struct page is less than
+		 * MAX_STRUCT_PAGE_SIZE. The goal here is compatibility in the
+		 * face of production kernel configurations that reduce the
+		 * 'struct page' size below MAX_STRUCT_PAGE_SIZE. For debug
+		 * kernel configurations that increase the 'struct page' size
+		 * above MAX_STRUCT_PAGE_SIZE, the page_struct_override allows
+		 * for continuing with the capacity that will be wasted when
+		 * reverting to a production kernel configuration. Otherwise,
+		 * those configurations are blocked by default.
 		 */
-		BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);
-		offset = ALIGN(start + SZ_8K + MAX_STRUCT_PAGE_SIZE * npfns, align)
-			- start;
+		if (sizeof(struct page) > MAX_STRUCT_PAGE_SIZE) {
+			if (page_struct_override)
+				page_map_size = sizeof(struct page) * npfns;
+			else {
+				dev_err(&nd_pfn->dev,
+					"Memory debug options prevent using pmem for the page map\n");
+				return -EINVAL;
+			}
+		}
+		offset = ALIGN(start + SZ_8K + page_map_size, align) - start;
 	} else if (nd_pfn->mode == PFN_MODE_RAM)
 		offset = ALIGN(start + SZ_8K, align) - start;
 	else
@@ -818,7 +830,10 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	pfn_sb->version_minor = cpu_to_le16(4);
 	pfn_sb->end_trunc = cpu_to_le32(end_trunc);
 	pfn_sb->align = cpu_to_le32(nd_pfn->align);
-	pfn_sb->page_struct_size = cpu_to_le16(MAX_STRUCT_PAGE_SIZE);
+	if (sizeof(struct page) > MAX_STRUCT_PAGE_SIZE && page_struct_override)
+		pfn_sb->page_struct_size = cpu_to_le16(sizeof(struct page));
+	else
+		pfn_sb->page_struct_size = cpu_to_le16(MAX_STRUCT_PAGE_SIZE);
 	pfn_sb->page_size = cpu_to_le32(PAGE_SIZE);
 	checksum = nd_sb_checksum((struct nd_gen_sb *) pfn_sb);
 	pfn_sb->checksum = cpu_to_le64(checksum);
Greg Kroah-Hartman Jan. 10, 2023, 5:56 a.m. UTC | #7
On Mon, Jan 09, 2023 at 02:06:36PM -0800, Dan Williams wrote:
> Alexander Potapenko wrote:
> > On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > Alexander Potapenko wrote:
> > > > (+ Dan Williams)
> > > > (resending with patch context included)
> > > >
> > > > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > > > >
> > > > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > > > >
> > > > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > > > 64 bytes anymore.
> > > > >
> > > > > Does this somehow cause extra space being used in all kernel configs?
> > > > > If not, it would be good to note this in the commit message.
> > > > >
> > > > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > > > Looks like this increases the amount of memory used by the nvdimm
> > > > driver in all kernel configs that enable it (including those that
> > > > don't use KMSAN), but I am not sure how much is that.
> > > >
> > > > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
> > >
> > > Apologies I missed this several months ago. The answer is that this
> > > causes everyone creating PMEM namespaces on v6.1+ to lose double the
> > > capacity of their namespace even when not using KMSAN which is too
> > > wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> > > increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> > > something like:
> > >
> > > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > > index 79d93126453d..5693869b720b 100644
> > > --- a/drivers/nvdimm/Kconfig
> > > +++ b/drivers/nvdimm/Kconfig
> > > @@ -63,6 +63,7 @@ config NVDIMM_PFN
> > >         bool "PFN: Map persistent (device) memory"
> > >         default LIBNVDIMM
> > >         depends on ZONE_DEVICE
> > > +       depends on !KMSAN
> > >         select ND_CLAIM
> > >         help
> > >           Map persistent memory, i.e. advertise it to the memory
> > >
> > >
> > > ...otherwise, what was the rationale for increasing this value? Were you
> > > actually trying to use KMSAN for DAX pages?
> > 
> > I was just building the kernel with nvdimm driver and KMSAN enabled.
> > Because KMSAN adds extra data to every struct page, it immediately hit
> > the following assert:
> > 
> > drivers/nvdimm/pfn_devs.c:796:3: error: call to
> > __compiletime_assert_330 declared with 'error' attribute: BUILD_BUG_ON
> > fE
> >                 BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);
> > 
> > The comment before MAX_STRUCT_PAGE_SIZE declaration says "max struct
> > page size independent of kernel config", but maybe we can afford
> > making it dependent on CONFIG_KMSAN (and possibly other config options
> > that increase struct page size)?
> > 
> > I don't mind disabling the driver under KMSAN, but having an extra
> > ifdef to keep KMSAN support sounds reasonable, WDYT?
> 
> How about a module parameter to opt-in to the increased permanent
> capacity loss?

Please no, this isn't the 1990's, we should never force users to keep
track of new module parameters that you then have to support for
forever.


> 
> -- >8 --
> >From 693563817dea3fd8f293f9b69ec78066ab1d96d2 Mon Sep 17 00:00:00 2001
> From: Dan Williams <dan.j.williams@intel.com>
> Date: Thu, 5 Jan 2023 13:27:34 -0800
> Subject: [PATCH] nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE
> 
> Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> 
> ...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
> potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
> doubles the amount of capacity stolen from user addressable capacity for
> everyone, regardless of whether they are using the debug option. Revert
> that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
> allow for debug scenarios to proceed with creating debug sized page maps
> with a new 'libnvdimm.page_struct_override' module parameter.
> 
> Note that this only applies to cases where the page map is permanent,
> i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
> create-namespace" terms). For the "--map=mem" case, since the allocation
> is ephemeral for the lifespan of the namespace, there are no explicit
> restriction. However, the implicit restriction, of having enough
> available "System RAM" to store the page map for the typically large
> pmem, still applies.
> 
> Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> Cc: <stable@vger.kernel.org>
> Cc: Alexander Potapenko <glider@google.com>
> Cc: Marco Elver <elver@google.com>
> Reported-by: Jeff Moyer <jmoyer@redhat.com>
> ---
>  drivers/nvdimm/nd.h       |  2 +-
>  drivers/nvdimm/pfn_devs.c | 45 ++++++++++++++++++++++++++-------------
>  2 files changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> index 85ca5b4da3cf..ec5219680092 100644
> --- a/drivers/nvdimm/nd.h
> +++ b/drivers/nvdimm/nd.h
> @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
>  		struct nd_namespace_common *ndns);
>  #if IS_ENABLED(CONFIG_ND_CLAIM)
>  /* max struct page size independent of kernel config */
> -#define MAX_STRUCT_PAGE_SIZE 128
> +#define MAX_STRUCT_PAGE_SIZE 64
>  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
>  #else
>  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index 61af072ac98f..978d63559c0e 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -13,6 +13,11 @@
>  #include "pfn.h"
>  #include "nd.h"
>  
> +static bool page_struct_override;
> +module_param(page_struct_override, bool, 0644);
> +MODULE_PARM_DESC(page_struct_override,
> +		 "Force namespace creation in the presence of mm-debug.");

I can't figure out from this description what this is for so perhaps it
should be either removed and made dynamic (if you know you want to debug
the mm core, why not turn it on then?) or made more obvious what is
happening?

thanks,

greg k-h
Dan Williams Jan. 10, 2023, 6:55 a.m. UTC | #8
Greg Kroah-Hartman wrote:
> On Mon, Jan 09, 2023 at 02:06:36PM -0800, Dan Williams wrote:
> > Alexander Potapenko wrote:
> > > On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > >
> > > > Alexander Potapenko wrote:
> > > > > (+ Dan Williams)
> > > > > (resending with patch context included)
> > > > >
> > > > > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > > > > >
> > > > > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > > > > >
> > > > > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > > > > 64 bytes anymore.
> > > > > >
> > > > > > Does this somehow cause extra space being used in all kernel configs?
> > > > > > If not, it would be good to note this in the commit message.
> > > > > >
> > > > > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > > > > Looks like this increases the amount of memory used by the nvdimm
> > > > > driver in all kernel configs that enable it (including those that
> > > > > don't use KMSAN), but I am not sure how much is that.
> > > > >
> > > > > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
> > > >
> > > > Apologies I missed this several months ago. The answer is that this
> > > > causes everyone creating PMEM namespaces on v6.1+ to lose double the
> > > > capacity of their namespace even when not using KMSAN which is too
> > > > wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> > > > increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> > > > something like:
> > > >
> > > > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > > > index 79d93126453d..5693869b720b 100644
> > > > --- a/drivers/nvdimm/Kconfig
> > > > +++ b/drivers/nvdimm/Kconfig
> > > > @@ -63,6 +63,7 @@ config NVDIMM_PFN
> > > >         bool "PFN: Map persistent (device) memory"
> > > >         default LIBNVDIMM
> > > >         depends on ZONE_DEVICE
> > > > +       depends on !KMSAN
> > > >         select ND_CLAIM
> > > >         help
> > > >           Map persistent memory, i.e. advertise it to the memory
> > > >
> > > >
> > > > ...otherwise, what was the rationale for increasing this value? Were you
> > > > actually trying to use KMSAN for DAX pages?
> > > 
> > > I was just building the kernel with nvdimm driver and KMSAN enabled.
> > > Because KMSAN adds extra data to every struct page, it immediately hit
> > > the following assert:
> > > 
> > > drivers/nvdimm/pfn_devs.c:796:3: error: call to
> > > __compiletime_assert_330 declared with 'error' attribute: BUILD_BUG_ON
> > > fE
> > >                 BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);
> > > 
> > > The comment before MAX_STRUCT_PAGE_SIZE declaration says "max struct
> > > page size independent of kernel config", but maybe we can afford
> > > making it dependent on CONFIG_KMSAN (and possibly other config options
> > > that increase struct page size)?
> > > 
> > > I don't mind disabling the driver under KMSAN, but having an extra
> > > ifdef to keep KMSAN support sounds reasonable, WDYT?
> > 
> > How about a module parameter to opt-in to the increased permanent
> > capacity loss?
> 
> Please no, this isn't the 1990's, we should never force users to keep
> track of new module parameters that you then have to support for
> forever.

Fair enough, premature enabling. If someone really wants this they can
find this thread in the archives and ask for another solution like
compile time override.

> 
> 
> > 
> > -- >8 --
> > >From 693563817dea3fd8f293f9b69ec78066ab1d96d2 Mon Sep 17 00:00:00 2001
> > From: Dan Williams <dan.j.williams@intel.com>
> > Date: Thu, 5 Jan 2023 13:27:34 -0800
> > Subject: [PATCH] nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE
> > 
> > Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > 
> > ...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
> > potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
> > doubles the amount of capacity stolen from user addressable capacity for
> > everyone, regardless of whether they are using the debug option. Revert
> > that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
> > allow for debug scenarios to proceed with creating debug sized page maps
> > with a new 'libnvdimm.page_struct_override' module parameter.
> > 
> > Note that this only applies to cases where the page map is permanent,
> > i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
> > create-namespace" terms). For the "--map=mem" case, since the allocation
> > is ephemeral for the lifespan of the namespace, there are no explicit
> > restriction. However, the implicit restriction, of having enough
> > available "System RAM" to store the page map for the typically large
> > pmem, still applies.
> > 
> > Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > Cc: <stable@vger.kernel.org>
> > Cc: Alexander Potapenko <glider@google.com>
> > Cc: Marco Elver <elver@google.com>
> > Reported-by: Jeff Moyer <jmoyer@redhat.com>
> > ---
> >  drivers/nvdimm/nd.h       |  2 +-
> >  drivers/nvdimm/pfn_devs.c | 45 ++++++++++++++++++++++++++-------------
> >  2 files changed, 31 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> > index 85ca5b4da3cf..ec5219680092 100644
> > --- a/drivers/nvdimm/nd.h
> > +++ b/drivers/nvdimm/nd.h
> > @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
> >  		struct nd_namespace_common *ndns);
> >  #if IS_ENABLED(CONFIG_ND_CLAIM)
> >  /* max struct page size independent of kernel config */
> > -#define MAX_STRUCT_PAGE_SIZE 128
> > +#define MAX_STRUCT_PAGE_SIZE 64
> >  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
> >  #else
> >  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> > index 61af072ac98f..978d63559c0e 100644
> > --- a/drivers/nvdimm/pfn_devs.c
> > +++ b/drivers/nvdimm/pfn_devs.c
> > @@ -13,6 +13,11 @@
> >  #include "pfn.h"
> >  #include "nd.h"
> >  
> > +static bool page_struct_override;
> > +module_param(page_struct_override, bool, 0644);
> > +MODULE_PARM_DESC(page_struct_override,
> > +		 "Force namespace creation in the presence of mm-debug.");
> 
> I can't figure out from this description what this is for so perhaps it
> should be either removed and made dynamic (if you know you want to debug
> the mm core, why not turn it on then?) or made more obvious what is
> happening?

I'll kill it and update the KMSAN Documentation that KMSAN has
interactions with the NVDIMM subsystem that may cause some namespaces to
fail to enable. That Documentation needs to be a part of this patch
regardless as that would be the default behavior of this module
parameter.

Unfortunately, it can not be dynamically enabled because the size of
'struct page' is unfortunately recorded in the metadata of the device.
Recall this is for supporting platform configurations where the capacity
of the persistent memory exceeds or consumes too much of System RAM.
Consider 4TB of PMEM consumes 64GB of space just for 'struct page'. So,
NVDIMM subsystem has a mode to store that page array in a reservation on
the PMEM device itself.

KMSAN mandates either that all namespaces all the time reserve the extra
capacity, or that those namespace cannot be mapped while KMSAN is
enabled.
Alexander Potapenko Jan. 10, 2023, 8:48 a.m. UTC | #9
On Tue, Jan 10, 2023 at 7:55 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Greg Kroah-Hartman wrote:
> > On Mon, Jan 09, 2023 at 02:06:36PM -0800, Dan Williams wrote:
> > > Alexander Potapenko wrote:
> > > > On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > > >
> > > > > Alexander Potapenko wrote:
> > > > > > (+ Dan Williams)
> > > > > > (resending with patch context included)
> > > > > >
> > > > > > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > > > > > >
> > > > > > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > > > > > >
> > > > > > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > > > > > 64 bytes anymore.
> > > > > > >
> > > > > > > Does this somehow cause extra space being used in all kernel configs?
> > > > > > > If not, it would be good to note this in the commit message.
> > > > > > >
> > > > > > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > > > > > Looks like this increases the amount of memory used by the nvdimm
> > > > > > driver in all kernel configs that enable it (including those that
> > > > > > don't use KMSAN), but I am not sure how much is that.
> > > > > >
> > > > > > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
> > > > >
> > > > > Apologies I missed this several months ago. The answer is that this
> > > > > causes everyone creating PMEM namespaces on v6.1+ to lose double the
> > > > > capacity of their namespace even when not using KMSAN which is too
> > > > > wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> > > > > increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> > > > > something like:
> > > > >
> > > > > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > > > > index 79d93126453d..5693869b720b 100644
> > > > > --- a/drivers/nvdimm/Kconfig
> > > > > +++ b/drivers/nvdimm/Kconfig
> > > > > @@ -63,6 +63,7 @@ config NVDIMM_PFN
> > > > >         bool "PFN: Map persistent (device) memory"
> > > > >         default LIBNVDIMM
> > > > >         depends on ZONE_DEVICE
> > > > > +       depends on !KMSAN
> > > > >         select ND_CLAIM
> > > > >         help
> > > > >           Map persistent memory, i.e. advertise it to the memory
> > > > >
> > > > >
> > > > > ...otherwise, what was the rationale for increasing this value? Were you
> > > > > actually trying to use KMSAN for DAX pages?
> > > >
> > > > I was just building the kernel with nvdimm driver and KMSAN enabled.
> > > > Because KMSAN adds extra data to every struct page, it immediately hit
> > > > the following assert:
> > > >
> > > > drivers/nvdimm/pfn_devs.c:796:3: error: call to
> > > > __compiletime_assert_330 declared with 'error' attribute: BUILD_BUG_ON
> > > > fE
> > > >                 BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);
> > > >
> > > > The comment before MAX_STRUCT_PAGE_SIZE declaration says "max struct
> > > > page size independent of kernel config", but maybe we can afford
> > > > making it dependent on CONFIG_KMSAN (and possibly other config options
> > > > that increase struct page size)?
> > > >
> > > > I don't mind disabling the driver under KMSAN, but having an extra
> > > > ifdef to keep KMSAN support sounds reasonable, WDYT?
> > >
> > > How about a module parameter to opt-in to the increased permanent
> > > capacity loss?
> >
> > Please no, this isn't the 1990's, we should never force users to keep
> > track of new module parameters that you then have to support for
> > forever.
>
> Fair enough, premature enabling. If someone really wants this they can
> find this thread in the archives and ask for another solution like
> compile time override.
>
> >
> >
> > >
> > > -- >8 --
> > > >From 693563817dea3fd8f293f9b69ec78066ab1d96d2 Mon Sep 17 00:00:00 2001
> > > From: Dan Williams <dan.j.williams@intel.com>
> > > Date: Thu, 5 Jan 2023 13:27:34 -0800
> > > Subject: [PATCH] nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE
> > >
> > > Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > >
> > > ...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
> > > potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
> > > doubles the amount of capacity stolen from user addressable capacity for
> > > everyone, regardless of whether they are using the debug option. Revert
> > > that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
> > > allow for debug scenarios to proceed with creating debug sized page maps
> > > with a new 'libnvdimm.page_struct_override' module parameter.
> > >
> > > Note that this only applies to cases where the page map is permanent,
> > > i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
> > > create-namespace" terms). For the "--map=mem" case, since the allocation
> > > is ephemeral for the lifespan of the namespace, there are no explicit
> > > restriction. However, the implicit restriction, of having enough
> > > available "System RAM" to store the page map for the typically large
> > > pmem, still applies.
> > >
> > > Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > > Cc: <stable@vger.kernel.org>
> > > Cc: Alexander Potapenko <glider@google.com>
> > > Cc: Marco Elver <elver@google.com>
> > > Reported-by: Jeff Moyer <jmoyer@redhat.com>
> > > ---
> > >  drivers/nvdimm/nd.h       |  2 +-
> > >  drivers/nvdimm/pfn_devs.c | 45 ++++++++++++++++++++++++++-------------
> > >  2 files changed, 31 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> > > index 85ca5b4da3cf..ec5219680092 100644
> > > --- a/drivers/nvdimm/nd.h
> > > +++ b/drivers/nvdimm/nd.h
> > > @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
> > >             struct nd_namespace_common *ndns);
> > >  #if IS_ENABLED(CONFIG_ND_CLAIM)
> > >  /* max struct page size independent of kernel config */
> > > -#define MAX_STRUCT_PAGE_SIZE 128
> > > +#define MAX_STRUCT_PAGE_SIZE 64
> > >  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
> > >  #else
> > >  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> > > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> > > index 61af072ac98f..978d63559c0e 100644
> > > --- a/drivers/nvdimm/pfn_devs.c
> > > +++ b/drivers/nvdimm/pfn_devs.c
> > > @@ -13,6 +13,11 @@
> > >  #include "pfn.h"
> > >  #include "nd.h"
> > >
> > > +static bool page_struct_override;
> > > +module_param(page_struct_override, bool, 0644);
> > > +MODULE_PARM_DESC(page_struct_override,
> > > +            "Force namespace creation in the presence of mm-debug.");
> >
> > I can't figure out from this description what this is for so perhaps it
> > should be either removed and made dynamic (if you know you want to debug
> > the mm core, why not turn it on then?) or made more obvious what is
> > happening?
>
> I'll kill it and update the KMSAN Documentation that KMSAN has
> interactions with the NVDIMM subsystem that may cause some namespaces to
> fail to enable. That Documentation needs to be a part of this patch
> regardless as that would be the default behavior of this module
> parameter.
>
> Unfortunately, it can not be dynamically enabled because the size of
> 'struct page' is unfortunately recorded in the metadata of the device.
> Recall this is for supporting platform configurations where the capacity
> of the persistent memory exceeds or consumes too much of System RAM.
> Consider 4TB of PMEM consumes 64GB of space just for 'struct page'. So,
> NVDIMM subsystem has a mode to store that page array in a reservation on
> the PMEM device itself.

Sorry, I might be missing something, but why cannot we have

#ifdef CONFIG_KMSAN
#define MAX_STRUCT_PAGE_SIZE 128
#else
#define MAX_STRUCT_PAGE_SIZE 64
#endif

?

KMSAN is a debug-only tool, it already consumes more than two thirds
of the system memory, so you don't want to enable it in any production
environment anyway.

> KMSAN mandates either that all namespaces all the time reserve the extra
> capacity, or that those namespace cannot be mapped while KMSAN is
> enabled.

Struct page depends on a couple of config options that affect its
size, and has already been approaching the 64 byte boundary.
It is unfortunate that the introduction of KMSAN was the last straw,
but it could've been any other debug config that needs to store data
in the struct page.
Keeping the struct within cacheline size sounds reasonable for the
default configuration, but having a build-time assert that prevents us
from building debug configs sounds excessive.

> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/63bd0be8945a0_5178e29414%40dwillia2-xfh.jf.intel.com.notmuch.
Alexander Potapenko Jan. 10, 2023, 8:52 a.m. UTC | #10
> > >
> > > >
> > > > -- >8 --
> > > > >From 693563817dea3fd8f293f9b69ec78066ab1d96d2 Mon Sep 17 00:00:00 2001
> > > > From: Dan Williams <dan.j.williams@intel.com>
> > > > Date: Thu, 5 Jan 2023 13:27:34 -0800
> > > > Subject: [PATCH] nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE
> > > >
> > > > Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > > >
> > > > ...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
> > > > potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
> > > > doubles the amount of capacity stolen from user addressable capacity for
> > > > everyone, regardless of whether they are using the debug option. Revert
> > > > that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
> > > > allow for debug scenarios to proceed with creating debug sized page maps
> > > > with a new 'libnvdimm.page_struct_override' module parameter.
> > > >
> > > > Note that this only applies to cases where the page map is permanent,
> > > > i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
> > > > create-namespace" terms). For the "--map=mem" case, since the allocation
> > > > is ephemeral for the lifespan of the namespace, there are no explicit
> > > > restriction. However, the implicit restriction, of having enough
> > > > available "System RAM" to store the page map for the typically large
> > > > pmem, still applies.
> > > >
> > > > Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > > > Cc: <stable@vger.kernel.org>
> > > > Cc: Alexander Potapenko <glider@google.com>
> > > > Cc: Marco Elver <elver@google.com>
> > > > Reported-by: Jeff Moyer <jmoyer@redhat.com>
> > > > ---
> > > >  drivers/nvdimm/nd.h       |  2 +-
> > > >  drivers/nvdimm/pfn_devs.c | 45 ++++++++++++++++++++++++++-------------
> > > >  2 files changed, 31 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> > > > index 85ca5b4da3cf..ec5219680092 100644
> > > > --- a/drivers/nvdimm/nd.h
> > > > +++ b/drivers/nvdimm/nd.h
> > > > @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
> > > >             struct nd_namespace_common *ndns);
> > > >  #if IS_ENABLED(CONFIG_ND_CLAIM)
> > > >  /* max struct page size independent of kernel config */
> > > > -#define MAX_STRUCT_PAGE_SIZE 128
> > > > +#define MAX_STRUCT_PAGE_SIZE 64
> > > >  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
> > > >  #else
> > > >  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> > > > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> > > > index 61af072ac98f..978d63559c0e 100644
> > > > --- a/drivers/nvdimm/pfn_devs.c
> > > > +++ b/drivers/nvdimm/pfn_devs.c
> > > > @@ -13,6 +13,11 @@
> > > >  #include "pfn.h"
> > > >  #include "nd.h"
> > > >
> > > > +static bool page_struct_override;
> > > > +module_param(page_struct_override, bool, 0644);
> > > > +MODULE_PARM_DESC(page_struct_override,
> > > > +            "Force namespace creation in the presence of mm-debug.");
> > >
> > > I can't figure out from this description what this is for so perhaps it
> > > should be either removed and made dynamic (if you know you want to debug
> > > the mm core, why not turn it on then?) or made more obvious what is
> > > happening?
> >
> > I'll kill it and update the KMSAN Documentation that KMSAN has
> > interactions with the NVDIMM subsystem that may cause some namespaces to
> > fail to enable. That Documentation needs to be a part of this patch
> > regardless as that would be the default behavior of this module
> > parameter.
> >
> > Unfortunately, it can not be dynamically enabled because the size of
> > 'struct page' is unfortunately recorded in the metadata of the device.
> > Recall this is for supporting platform configurations where the capacity
> > of the persistent memory exceeds or consumes too much of System RAM.
> > Consider 4TB of PMEM consumes 64GB of space just for 'struct page'. So,
> > NVDIMM subsystem has a mode to store that page array in a reservation on
> > the PMEM device itself.
>
> Sorry, I might be missing something, but why cannot we have
>
> #ifdef CONFIG_KMSAN
> #define MAX_STRUCT_PAGE_SIZE 128
By the way, KMSAN only adds 16 bytes to struct page - would it help to
reduce MAX_STRUCT_PAGE_SIZE to 80 bytes?
> #else
> #define MAX_STRUCT_PAGE_SIZE 64
> #endif
>
Eric Dumazet Jan. 10, 2023, 8:53 a.m. UTC | #11
On Tue, Jan 10, 2023 at 9:49 AM Alexander Potapenko <glider@google.com> wrote:
>
> On Tue, Jan 10, 2023 at 7:55 AM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > Greg Kroah-Hartman wrote:
> > > On Mon, Jan 09, 2023 at 02:06:36PM -0800, Dan Williams wrote:
> > > > Alexander Potapenko wrote:
> > > > > On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > > > >
> > > > > > Alexander Potapenko wrote:
> > > > > > > (+ Dan Williams)
> > > > > > > (resending with patch context included)
> > > > > > >
> > > > > > > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > > > > > > >
> > > > > > > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > > > > > > 64 bytes anymore.
> > > > > > > >
> > > > > > > > Does this somehow cause extra space being used in all kernel configs?
> > > > > > > > If not, it would be good to note this in the commit message.
> > > > > > > >
> > > > > > > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > > > > > > Looks like this increases the amount of memory used by the nvdimm
> > > > > > > driver in all kernel configs that enable it (including those that
> > > > > > > don't use KMSAN), but I am not sure how much is that.
> > > > > > >
> > > > > > > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
> > > > > >
> > > > > > Apologies I missed this several months ago. The answer is that this
> > > > > > causes everyone creating PMEM namespaces on v6.1+ to lose double the
> > > > > > capacity of their namespace even when not using KMSAN which is too
> > > > > > wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> > > > > > increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> > > > > > something like:
> > > > > >
> > > > > > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > > > > > index 79d93126453d..5693869b720b 100644
> > > > > > --- a/drivers/nvdimm/Kconfig
> > > > > > +++ b/drivers/nvdimm/Kconfig
> > > > > > @@ -63,6 +63,7 @@ config NVDIMM_PFN
> > > > > >         bool "PFN: Map persistent (device) memory"
> > > > > >         default LIBNVDIMM
> > > > > >         depends on ZONE_DEVICE
> > > > > > +       depends on !KMSAN
> > > > > >         select ND_CLAIM
> > > > > >         help
> > > > > >           Map persistent memory, i.e. advertise it to the memory
> > > > > >
> > > > > >
> > > > > > ...otherwise, what was the rationale for increasing this value? Were you
> > > > > > actually trying to use KMSAN for DAX pages?
> > > > >
> > > > > I was just building the kernel with nvdimm driver and KMSAN enabled.
> > > > > Because KMSAN adds extra data to every struct page, it immediately hit
> > > > > the following assert:
> > > > >
> > > > > drivers/nvdimm/pfn_devs.c:796:3: error: call to
> > > > > __compiletime_assert_330 declared with 'error' attribute: BUILD_BUG_ON
> > > > > fE
> > > > >                 BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);
> > > > >
> > > > > The comment before MAX_STRUCT_PAGE_SIZE declaration says "max struct
> > > > > page size independent of kernel config", but maybe we can afford
> > > > > making it dependent on CONFIG_KMSAN (and possibly other config options
> > > > > that increase struct page size)?
> > > > >
> > > > > I don't mind disabling the driver under KMSAN, but having an extra
> > > > > ifdef to keep KMSAN support sounds reasonable, WDYT?
> > > >
> > > > How about a module parameter to opt-in to the increased permanent
> > > > capacity loss?
> > >
> > > Please no, this isn't the 1990's, we should never force users to keep
> > > track of new module parameters that you then have to support for
> > > forever.
> >
> > Fair enough, premature enabling. If someone really wants this they can
> > find this thread in the archives and ask for another solution like
> > compile time override.
> >
> > >
> > >
> > > >
> > > > -- >8 --
> > > > >From 693563817dea3fd8f293f9b69ec78066ab1d96d2 Mon Sep 17 00:00:00 2001
> > > > From: Dan Williams <dan.j.williams@intel.com>
> > > > Date: Thu, 5 Jan 2023 13:27:34 -0800
> > > > Subject: [PATCH] nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE
> > > >
> > > > Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > > >
> > > > ...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
> > > > potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
> > > > doubles the amount of capacity stolen from user addressable capacity for
> > > > everyone, regardless of whether they are using the debug option. Revert
> > > > that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
> > > > allow for debug scenarios to proceed with creating debug sized page maps
> > > > with a new 'libnvdimm.page_struct_override' module parameter.
> > > >
> > > > Note that this only applies to cases where the page map is permanent,
> > > > i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
> > > > create-namespace" terms). For the "--map=mem" case, since the allocation
> > > > is ephemeral for the lifespan of the namespace, there are no explicit
> > > > restriction. However, the implicit restriction, of having enough
> > > > available "System RAM" to store the page map for the typically large
> > > > pmem, still applies.
> > > >
> > > > Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
> > > > Cc: <stable@vger.kernel.org>
> > > > Cc: Alexander Potapenko <glider@google.com>
> > > > Cc: Marco Elver <elver@google.com>
> > > > Reported-by: Jeff Moyer <jmoyer@redhat.com>
> > > > ---
> > > >  drivers/nvdimm/nd.h       |  2 +-
> > > >  drivers/nvdimm/pfn_devs.c | 45 ++++++++++++++++++++++++++-------------
> > > >  2 files changed, 31 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> > > > index 85ca5b4da3cf..ec5219680092 100644
> > > > --- a/drivers/nvdimm/nd.h
> > > > +++ b/drivers/nvdimm/nd.h
> > > > @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
> > > >             struct nd_namespace_common *ndns);
> > > >  #if IS_ENABLED(CONFIG_ND_CLAIM)
> > > >  /* max struct page size independent of kernel config */
> > > > -#define MAX_STRUCT_PAGE_SIZE 128
> > > > +#define MAX_STRUCT_PAGE_SIZE 64
> > > >  int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
> > > >  #else
> > > >  static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
> > > > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> > > > index 61af072ac98f..978d63559c0e 100644
> > > > --- a/drivers/nvdimm/pfn_devs.c
> > > > +++ b/drivers/nvdimm/pfn_devs.c
> > > > @@ -13,6 +13,11 @@
> > > >  #include "pfn.h"
> > > >  #include "nd.h"
> > > >
> > > > +static bool page_struct_override;
> > > > +module_param(page_struct_override, bool, 0644);
> > > > +MODULE_PARM_DESC(page_struct_override,
> > > > +            "Force namespace creation in the presence of mm-debug.");
> > >
> > > I can't figure out from this description what this is for so perhaps it
> > > should be either removed and made dynamic (if you know you want to debug
> > > the mm core, why not turn it on then?) or made more obvious what is
> > > happening?
> >
> > I'll kill it and update the KMSAN Documentation that KMSAN has
> > interactions with the NVDIMM subsystem that may cause some namespaces to
> > fail to enable. That Documentation needs to be a part of this patch
> > regardless as that would be the default behavior of this module
> > parameter.
> >
> > Unfortunately, it can not be dynamically enabled because the size of
> > 'struct page' is unfortunately recorded in the metadata of the device.
> > Recall this is for supporting platform configurations where the capacity
> > of the persistent memory exceeds or consumes too much of System RAM.
> > Consider 4TB of PMEM consumes 64GB of space just for 'struct page'. So,
> > NVDIMM subsystem has a mode to store that page array in a reservation on
> > the PMEM device itself.
>
> Sorry, I might be missing something, but why cannot we have
>
> #ifdef CONFIG_KMSAN
> #define MAX_STRUCT_PAGE_SIZE 128
> #else
> #define MAX_STRUCT_PAGE_SIZE 64
> #endif
>

Possibly because this needs to be a fixed size on permanent storage
(like an inode on a disk file system)



> ?
>
> KMSAN is a debug-only tool, it already consumes more than two thirds
> of the system memory, so you don't want to enable it in any production
> environment anyway.
>
> > KMSAN mandates either that all namespaces all the time reserve the extra
> > capacity, or that those namespace cannot be mapped while KMSAN is
> > enabled.
>
> Struct page depends on a couple of config options that affect its
> size, and has already been approaching the 64 byte boundary.
> It is unfortunate that the introduction of KMSAN was the last straw,
> but it could've been any other debug config that needs to store data
> in the struct page.
> Keeping the struct within cacheline size sounds reasonable for the
> default configuration, but having a build-time assert that prevents us
> from building debug configs sounds excessive.
>
> > --
> > You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/63bd0be8945a0_5178e29414%40dwillia2-xfh.jf.intel.com.notmuch.
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Liana Sebastian
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
Christoph Hellwig Jan. 10, 2023, 8:55 a.m. UTC | #12
Folks, can you please trim your quotes to what is relevant?  I've not
actually beeng finding any relevant information in the last three mails
before giving up the scrolling after multiple pages.
Alexander Potapenko Jan. 10, 2023, 9:14 a.m. UTC | #13
> > > Unfortunately, it can not be dynamically enabled because the size of
> > > 'struct page' is unfortunately recorded in the metadata of the device.
> > > Recall this is for supporting platform configurations where the capacity
> > > of the persistent memory exceeds or consumes too much of System RAM.
> > > Consider 4TB of PMEM consumes 64GB of space just for 'struct page'. So,
> > > NVDIMM subsystem has a mode to store that page array in a reservation on
> > > the PMEM device itself.
> >
> > Sorry, I might be missing something, but why cannot we have
> >
> > #ifdef CONFIG_KMSAN
> > #define MAX_STRUCT_PAGE_SIZE 128
> > #else
> > #define MAX_STRUCT_PAGE_SIZE 64
> > #endif
> >
>
> Possibly because this needs to be a fixed size on permanent storage
> (like an inode on a disk file system)
>

Ah, thank you, that makes sense.
Then I'm assuming some contents of struct pages are also stored in the
persistent memory. What happens if sizeof(struct page) stays below
MAX_STRUCT_PAGE_SIZE, but the layout changes?
E.g. doesn't it cause problems if the user enables/disables CONFIG_MEMCG?
Steven Rostedt Jan. 10, 2023, 3:35 p.m. UTC | #14
On Tue, 10 Jan 2023 09:55:49 +0100
Christoph Hellwig <hch@lst.de> wrote:

> Folks, can you please trim your quotes to what is relevant?  I've not
> actually beeng finding any relevant information in the last three mails
> before giving up the scrolling after multiple pages.

If I don't see a response after three scrolls, I ignore the email.

-- Steve
Alexander Potapenko Jan. 30, 2023, 8:34 a.m. UTC | #15
On Thu, Jan 5, 2023 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Alexander Potapenko wrote:
> > (+ Dan Williams)
> > (resending with patch context included)
> >
> > On Mon, Jul 11, 2022 at 6:27 PM Marco Elver <elver@google.com> wrote:
> > >
> > > On Fri, 1 Jul 2022 at 16:23, Alexander Potapenko <glider@google.com> wrote:
> > > >
> > > > KMSAN adds extra metadata fields to struct page, so it does not fit into
> > > > 64 bytes anymore.
> > >
> > > Does this somehow cause extra space being used in all kernel configs?
> > > If not, it would be good to note this in the commit message.
> > >
> > I actually couldn't verify this on QEMU, because the driver never got loaded.
> > Looks like this increases the amount of memory used by the nvdimm
> > driver in all kernel configs that enable it (including those that
> > don't use KMSAN), but I am not sure how much is that.
> >
> > Dan, do you know how bad increasing MAX_STRUCT_PAGE_SIZE can be?
>
> Apologies I missed this several months ago. The answer is that this
> causes everyone creating PMEM namespaces on v6.1+ to lose double the
> capacity of their namespace even when not using KMSAN which is too
> wasteful to tolerate. So, I think "6e9f05dc66f9 libnvdimm/pfn_dev:
> increase MAX_STRUCT_PAGE_SIZE" needs to be reverted and replaced with
> something like:
>
> diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> index 79d93126453d..5693869b720b 100644
> --- a/drivers/nvdimm/Kconfig
> +++ b/drivers/nvdimm/Kconfig
> @@ -63,6 +63,7 @@ config NVDIMM_PFN
>         bool "PFN: Map persistent (device) memory"
>         default LIBNVDIMM
>         depends on ZONE_DEVICE
> +       depends on !KMSAN
>         select ND_CLAIM
>         help
>           Map persistent memory, i.e. advertise it to the memory
>

Looks like we still don't have a resolution for this problem.
I have the following options in mind:

1. Set MAX_STRUCT_PAGE_SIZE to 80 (i.e. increase it by 2*sizeof(struct
page *) added by KMSAN) instead of 128.
2. Disable storing of struct pages on device for KMSAN builds.

, but if those are infeasible, we can always go for:

3. Disable KMSAN for NVDIMM and reflect it in Documentation. I am
happy to send the patch if we decide this is the best option.
Dan Williams Jan. 30, 2023, 6:57 p.m. UTC | #16
Alexander Potapenko wrote:
[..]
> >
> > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > index 79d93126453d..5693869b720b 100644
> > --- a/drivers/nvdimm/Kconfig
> > +++ b/drivers/nvdimm/Kconfig
> > @@ -63,6 +63,7 @@ config NVDIMM_PFN
> >         bool "PFN: Map persistent (device) memory"
> >         default LIBNVDIMM
> >         depends on ZONE_DEVICE
> > +       depends on !KMSAN
> >         select ND_CLAIM
> >         help
> >           Map persistent memory, i.e. advertise it to the memory
> >
> 
> Looks like we still don't have a resolution for this problem.
> I have the following options in mind:
> 
> 1. Set MAX_STRUCT_PAGE_SIZE to 80 (i.e. increase it by 2*sizeof(struct
> page *) added by KMSAN) instead of 128.
> 2. Disable storing of struct pages on device for KMSAN builds.
> 
> , but if those are infeasible, we can always go for:
> 
> 3. Disable KMSAN for NVDIMM and reflect it in Documentation. I am
> happy to send the patch if we decide this is the best option.

I copied you on the new proposal here:

https://lore.kernel.org/nvdimm/167467815773.463042.7022545814443036382.stgit@dwillia2-xfh.jf.intel.com/

It disables PMEM namespace creation with page-array reservations when
sizeof(struct page) > 64.

Note, it was pre-existing behavior for PMEM namespaces with too small of
a reservation to fail to enable. That gives me confidence that the
restriction to lose some PMEM namespace access with these memory debug
facilities is acceptable.
diff mbox series

Patch

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index ec5219680092d..85ca5b4da3cf3 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -652,7 +652,7 @@  void devm_namespace_disable(struct device *dev,
 		struct nd_namespace_common *ndns);
 #if IS_ENABLED(CONFIG_ND_CLAIM)
 /* max struct page size independent of kernel config */
-#define MAX_STRUCT_PAGE_SIZE 64
+#define MAX_STRUCT_PAGE_SIZE 128
 int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
 #else
 static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 0e92ab4b32833..61af072ac98f9 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -787,7 +787,7 @@  static int nd_pfn_init(struct nd_pfn *nd_pfn)
 		 * when populating the vmemmap. This *should* be equal to
 		 * PMD_SIZE for most architectures.
 		 *
-		 * Also make sure size of struct page is less than 64. We
+		 * Also make sure size of struct page is less than 128. We
 		 * want to make sure we use large enough size here so that
 		 * we don't have a dynamic reserve space depending on
 		 * struct page size. But we also want to make sure we notice