diff mbox series

mm/mremap_pages: Fix static key devmap_managed_key updates

Message ID 20201022060753.21173-1-aneesh.kumar@linux.ibm.com
State New
Headers show
Series mm/mremap_pages: Fix static key devmap_managed_key updates | expand

Commit Message

Aneesh Kumar K.V Oct. 22, 2020, 6:07 a.m. UTC
commit 6f42193fd86e ("memremap: don't use a separate devm action for
devmap_managed_enable_get") changed the static key updates such that we
now call devmap_managed_enable_put() without doing the equivalent
devmap_managed_enable_get().

devmap_managed_enable_get() is only called for MEMORY_DEVICE_PRIVATE and
MEMORY_DEVICE_FS_DAX, But memunmap_pages() get called for other pgmap
types too. This results in the below warning when switching between
system-ram and devdax mode for devdax namespace.

 jump label: negative count!
 WARNING: CPU: 52 PID: 1335 at kernel/jump_label.c:235 static_key_slow_try_dec+0x88/0xa0
 Modules linked in:
 ....

 NIP [c000000000433318] static_key_slow_try_dec+0x88/0xa0
 LR [c000000000433314] static_key_slow_try_dec+0x84/0xa0
 Call Trace:
 [c000000025c1f660] [c000000000433314] static_key_slow_try_dec+0x84/0xa0 (unreliable)
 [c000000025c1f6d0] [c000000000433664] __static_key_slow_dec_cpuslocked+0x34/0xd0
 [c000000025c1f700] [c0000000004337a4] static_key_slow_dec+0x54/0xf0
 [c000000025c1f770] [c00000000059c49c] memunmap_pages+0x36c/0x500
 [c000000025c1f820] [c000000000d91d10] devm_action_release+0x30/0x50
 [c000000025c1f840] [c000000000d92e34] release_nodes+0x2f4/0x3e0
 [c000000025c1f8f0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
 [c000000025c1f930] [c000000000d883a4] bus_remove_device+0x124/0x210
 [c000000025c1f9b0] [c000000000d80ef4] device_del+0x1d4/0x530
 [c000000025c1fa70] [c000000000e341e8] unregister_dev_dax+0x48/0xe0
 [c000000025c1fae0] [c000000000d91d10] devm_action_release+0x30/0x50
 [c000000025c1fb00] [c000000000d92e34] release_nodes+0x2f4/0x3e0
 [c000000025c1fbb0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
 [c000000025c1fbf0] [c000000000d87000] unbind_store+0x130/0x170
 [c000000025c1fc30] [c000000000d862a0] drv_attr_store+0x40/0x60
 [c000000025c1fc50] [c0000000006d316c] sysfs_kf_write+0x6c/0xb0
 [c000000025c1fc90] [c0000000006d2328] kernfs_fop_write+0x118/0x280
 [c000000025c1fce0] [c0000000005a79f8] vfs_write+0xe8/0x2a0
 [c000000025c1fd30] [c0000000005a7d94] ksys_write+0x84/0x140
 [c000000025c1fd80] [c00000000003a430] system_call_exception+0x120/0x270
 [c000000025c1fe20] [c00000000000c540] system_call_common+0xf0/0x27c

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
Cc: linux-nvdimm@lists.01.org
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/memremap.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Comments

Sachin Sant Oct. 22, 2020, 8:34 a.m. UTC | #1
> jump label: negative count!
> WARNING: CPU: 52 PID: 1335 at kernel/jump_label.c:235 static_key_slow_try_dec+0x88/0xa0
> Modules linked in:
> ….
> 

> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Cc: linux-nvdimm@lists.01.org
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> —

Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>

-Sachin
Christoph Hellwig Oct. 22, 2020, 1:26 p.m. UTC | #2
Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
Ira Weiny Oct. 22, 2020, 3:41 p.m. UTC | #3
On Thu, Oct 22, 2020 at 11:37:53AM +0530, Aneesh Kumar K.V wrote:
> commit 6f42193fd86e ("memremap: don't use a separate devm action for
> devmap_managed_enable_get") changed the static key updates such that we
> now call devmap_managed_enable_put() without doing the equivalent
> devmap_managed_enable_get().
> 
> devmap_managed_enable_get() is only called for MEMORY_DEVICE_PRIVATE and
> MEMORY_DEVICE_FS_DAX, But memunmap_pages() get called for other pgmap
> types too. This results in the below warning when switching between
> system-ram and devdax mode for devdax namespace.
> 
>  jump label: negative count!
>  WARNING: CPU: 52 PID: 1335 at kernel/jump_label.c:235 static_key_slow_try_dec+0x88/0xa0
>  Modules linked in:
>  ....
> 
>  NIP [c000000000433318] static_key_slow_try_dec+0x88/0xa0
>  LR [c000000000433314] static_key_slow_try_dec+0x84/0xa0
>  Call Trace:
>  [c000000025c1f660] [c000000000433314] static_key_slow_try_dec+0x84/0xa0 (unreliable)
>  [c000000025c1f6d0] [c000000000433664] __static_key_slow_dec_cpuslocked+0x34/0xd0
>  [c000000025c1f700] [c0000000004337a4] static_key_slow_dec+0x54/0xf0
>  [c000000025c1f770] [c00000000059c49c] memunmap_pages+0x36c/0x500
>  [c000000025c1f820] [c000000000d91d10] devm_action_release+0x30/0x50
>  [c000000025c1f840] [c000000000d92e34] release_nodes+0x2f4/0x3e0
>  [c000000025c1f8f0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
>  [c000000025c1f930] [c000000000d883a4] bus_remove_device+0x124/0x210
>  [c000000025c1f9b0] [c000000000d80ef4] device_del+0x1d4/0x530
>  [c000000025c1fa70] [c000000000e341e8] unregister_dev_dax+0x48/0xe0
>  [c000000025c1fae0] [c000000000d91d10] devm_action_release+0x30/0x50
>  [c000000025c1fb00] [c000000000d92e34] release_nodes+0x2f4/0x3e0
>  [c000000025c1fbb0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
>  [c000000025c1fbf0] [c000000000d87000] unbind_store+0x130/0x170
>  [c000000025c1fc30] [c000000000d862a0] drv_attr_store+0x40/0x60
>  [c000000025c1fc50] [c0000000006d316c] sysfs_kf_write+0x6c/0xb0
>  [c000000025c1fc90] [c0000000006d2328] kernfs_fop_write+0x118/0x280
>  [c000000025c1fce0] [c0000000005a79f8] vfs_write+0xe8/0x2a0
>  [c000000025c1fd30] [c0000000005a7d94] ksys_write+0x84/0x140
>  [c000000025c1fd80] [c00000000003a430] system_call_exception+0x120/0x270
>  [c000000025c1fe20] [c00000000000c540] system_call_common+0xf0/0x27c
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Cc: linux-nvdimm@lists.01.org
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/memremap.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 73a206d0f645..d4402ff3e467 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -158,6 +158,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>  {
>  	unsigned long pfn;
>  	int i;
> +	bool need_devmap_managed = false;
> +
> +	switch (pgmap->type) {
> +	case MEMORY_DEVICE_PRIVATE:
> +	case MEMORY_DEVICE_FS_DAX:
> +		need_devmap_managed = true;
> +		break;
> +	default:
> +		break;
> +	}

Is it overkill to avoid duplicating this switch logic in
page_is_devmap_managed() by creating another call which can be used here?

>  
>  	dev_pagemap_kill(pgmap);
>  	for (i = 0; i < pgmap->nr_range; i++)
> @@ -169,7 +179,8 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>  		pageunmap_range(pgmap, i);
>  
>  	WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n");
> -	devmap_managed_enable_put();
> +	if (need_devmap_managed)
> +		devmap_managed_enable_put();
>  }
>  EXPORT_SYMBOL_GPL(memunmap_pages);
>  
> @@ -307,7 +318,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>  		.pgprot = PAGE_KERNEL,
>  	};
>  	const int nr_range = pgmap->nr_range;
> -	bool need_devmap_managed = true;
> +	bool need_devmap_managed = false;

I'm CC'ing Ralph Campbell because I think some of his work has proposed this
same change.

Ira

>  	int error, i;
>  
>  	if (WARN_ONCE(!nr_range, "nr_range must be specified\n"))
> @@ -327,6 +338,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>  			WARN(1, "Missing owner\n");
>  			return ERR_PTR(-EINVAL);
>  		}
> +		need_devmap_managed = true;
>  		break;
>  	case MEMORY_DEVICE_FS_DAX:
>  		if (!IS_ENABLED(CONFIG_ZONE_DEVICE) ||
> @@ -334,13 +346,12 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>  			WARN(1, "File system DAX not supported\n");
>  			return ERR_PTR(-EINVAL);
>  		}
> +		need_devmap_managed = true;
>  		break;
>  	case MEMORY_DEVICE_GENERIC:
> -		need_devmap_managed = false;
>  		break;
>  	case MEMORY_DEVICE_PCI_P2PDMA:
>  		params.pgprot = pgprot_noncached(params.pgprot);
> -		need_devmap_managed = false;
>  		break;
>  	default:
>  		WARN(1, "Invalid pgmap type %d\n", pgmap->type);
> -- 
> 2.26.2
>
Ralph Campbell Oct. 22, 2020, 6:19 p.m. UTC | #4
On 10/22/20 8:41 AM, Ira Weiny wrote:
> On Thu, Oct 22, 2020 at 11:37:53AM +0530, Aneesh Kumar K.V wrote:
>> commit 6f42193fd86e ("memremap: don't use a separate devm action for
>> devmap_managed_enable_get") changed the static key updates such that we
>> now call devmap_managed_enable_put() without doing the equivalent
>> devmap_managed_enable_get().
>>
>> devmap_managed_enable_get() is only called for MEMORY_DEVICE_PRIVATE and
>> MEMORY_DEVICE_FS_DAX, But memunmap_pages() get called for other pgmap
>> types too. This results in the below warning when switching between
>> system-ram and devdax mode for devdax namespace.
>>
>>   jump label: negative count!
>>   WARNING: CPU: 52 PID: 1335 at kernel/jump_label.c:235 static_key_slow_try_dec+0x88/0xa0
>>   Modules linked in:
>>   ....
>>
>>   NIP [c000000000433318] static_key_slow_try_dec+0x88/0xa0
>>   LR [c000000000433314] static_key_slow_try_dec+0x84/0xa0
>>   Call Trace:
>>   [c000000025c1f660] [c000000000433314] static_key_slow_try_dec+0x84/0xa0 (unreliable)
>>   [c000000025c1f6d0] [c000000000433664] __static_key_slow_dec_cpuslocked+0x34/0xd0
>>   [c000000025c1f700] [c0000000004337a4] static_key_slow_dec+0x54/0xf0
>>   [c000000025c1f770] [c00000000059c49c] memunmap_pages+0x36c/0x500
>>   [c000000025c1f820] [c000000000d91d10] devm_action_release+0x30/0x50
>>   [c000000025c1f840] [c000000000d92e34] release_nodes+0x2f4/0x3e0
>>   [c000000025c1f8f0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
>>   [c000000025c1f930] [c000000000d883a4] bus_remove_device+0x124/0x210
>>   [c000000025c1f9b0] [c000000000d80ef4] device_del+0x1d4/0x530
>>   [c000000025c1fa70] [c000000000e341e8] unregister_dev_dax+0x48/0xe0
>>   [c000000025c1fae0] [c000000000d91d10] devm_action_release+0x30/0x50
>>   [c000000025c1fb00] [c000000000d92e34] release_nodes+0x2f4/0x3e0
>>   [c000000025c1fbb0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
>>   [c000000025c1fbf0] [c000000000d87000] unbind_store+0x130/0x170
>>   [c000000025c1fc30] [c000000000d862a0] drv_attr_store+0x40/0x60
>>   [c000000025c1fc50] [c0000000006d316c] sysfs_kf_write+0x6c/0xb0
>>   [c000000025c1fc90] [c0000000006d2328] kernfs_fop_write+0x118/0x280
>>   [c000000025c1fce0] [c0000000005a79f8] vfs_write+0xe8/0x2a0
>>   [c000000025c1fd30] [c0000000005a7d94] ksys_write+0x84/0x140
>>   [c000000025c1fd80] [c00000000003a430] system_call_exception+0x120/0x270
>>   [c000000025c1fe20] [c00000000000c540] system_call_common+0xf0/0x27c
>>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
>> Cc: linux-nvdimm@lists.01.org
>> Cc: Ira Weiny <ira.weiny@intel.com>
>> Cc: Jason Gunthorpe <jgg@mellanox.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   mm/memremap.c | 19 +++++++++++++++----
>>   1 file changed, 15 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memremap.c b/mm/memremap.c
>> index 73a206d0f645..d4402ff3e467 100644
>> --- a/mm/memremap.c
>> +++ b/mm/memremap.c
>> @@ -158,6 +158,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>   {
>>   	unsigned long pfn;
>>   	int i;
>> +	bool need_devmap_managed = false;
>> +
>> +	switch (pgmap->type) {
>> +	case MEMORY_DEVICE_PRIVATE:
>> +	case MEMORY_DEVICE_FS_DAX:
>> +		need_devmap_managed = true;
>> +		break;
>> +	default:
>> +		break;
>> +	}
> 
> Is it overkill to avoid duplicating this switch logic in
> page_is_devmap_managed() by creating another call which can be used here?

Perhaps. I can imagine a helper defined in include/linux/mm.h which
page_is_devmap_managed() could also call but that would impact a lot of
places that include mm.h. Since memremap.c already has to have intimate
knowledge of the pgmap->type, I think limiting the change to just what
is needed is better for now. So the patch looks OK to me.

Looking at this some more, I would suggest changing devmap_managed_enable_get()
and devmap_managed_enable_put() to do the special case checking instead of
doing it in memremap_pages() and memunmap_pages().
Then devmap_managed_enable_get() doesn't need to return an error if
CONFIG_DEV_PAGEMAP_OPS isn't defined. I have only compile tested the
following.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
  mm/memremap.c | 39 ++++++++++++++++-----------------------
  1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/mm/memremap.c b/mm/memremap.c
index 73a206d0f645..16b2fb482da1 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -41,28 +41,24 @@ EXPORT_SYMBOL_GPL(memremap_compat_align);
  DEFINE_STATIC_KEY_FALSE(devmap_managed_key);
  EXPORT_SYMBOL(devmap_managed_key);
  
-static void devmap_managed_enable_put(void)
+static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
  {
-	static_branch_dec(&devmap_managed_key);
+	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
+	    pgmap->type == MEMORY_DEVICE_FS_DAX)
+		static_branch_dec(&devmap_managed_key);
  }
  
-static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
+static void devmap_managed_enable_get(struct dev_pagemap *pgmap)
  {
-	if (pgmap->type == MEMORY_DEVICE_PRIVATE &&
-	    (!pgmap->ops || !pgmap->ops->page_free)) {
-		WARN(1, "Missing page_free method\n");
-		return -EINVAL;
-	}
-
-	static_branch_inc(&devmap_managed_key);
-	return 0;
+	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
+	    pgmap->type == MEMORY_DEVICE_FS_DAX)
+		static_branch_inc(&devmap_managed_key);
  }
  #else
-static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
+static void devmap_managed_enable_get(struct dev_pagemap *pgmap)
  {
-	return -EINVAL;
  }
-static void devmap_managed_enable_put(void)
+static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
  {
  }
  #endif /* CONFIG_DEV_PAGEMAP_OPS */
@@ -169,7 +165,7 @@ void memunmap_pages(struct dev_pagemap *pgmap)
  		pageunmap_range(pgmap, i);
  
  	WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n");
-	devmap_managed_enable_put();
+	devmap_managed_enable_put(pgmap);
  }
  EXPORT_SYMBOL_GPL(memunmap_pages);
  
@@ -307,7 +303,6 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
  		.pgprot = PAGE_KERNEL,
  	};
  	const int nr_range = pgmap->nr_range;
-	bool need_devmap_managed = true;
  	int error, i;
  
  	if (WARN_ONCE(!nr_range, "nr_range must be specified\n"))
@@ -323,6 +318,10 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
  			WARN(1, "Missing migrate_to_ram method\n");
  			return ERR_PTR(-EINVAL);
  		}
+		if (!pgmap->ops->page_free) {
+			WARN(1, "Missing page_free method\n");
+			return ERR_PTR(-EINVAL);
+		}
  		if (!pgmap->owner) {
  			WARN(1, "Missing owner\n");
  			return ERR_PTR(-EINVAL);
@@ -336,11 +335,9 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
  		}
  		break;
  	case MEMORY_DEVICE_GENERIC:
-		need_devmap_managed = false;
  		break;
  	case MEMORY_DEVICE_PCI_P2PDMA:
  		params.pgprot = pgprot_noncached(params.pgprot);
-		need_devmap_managed = false;
  		break;
  	default:
  		WARN(1, "Invalid pgmap type %d\n", pgmap->type);
@@ -364,11 +361,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
  		}
  	}
  
-	if (need_devmap_managed) {
-		error = devmap_managed_enable_get(pgmap);
-		if (error)
-			return ERR_PTR(error);
-	}
+	devmap_managed_enable_get(pgmap);
  
  	/*
  	 * Clear the pgmap nr_range as it will be incremented for each
Ira Weiny Oct. 22, 2020, 7:10 p.m. UTC | #5
On Thu, Oct 22, 2020 at 11:19:43AM -0700, Ralph Campbell wrote:
> 
> On 10/22/20 8:41 AM, Ira Weiny wrote:
> > On Thu, Oct 22, 2020 at 11:37:53AM +0530, Aneesh Kumar K.V wrote:
> > > commit 6f42193fd86e ("memremap: don't use a separate devm action for
> > > devmap_managed_enable_get") changed the static key updates such that we
> > > now call devmap_managed_enable_put() without doing the equivalent
> > > devmap_managed_enable_get().
> > > 
> > > devmap_managed_enable_get() is only called for MEMORY_DEVICE_PRIVATE and
> > > MEMORY_DEVICE_FS_DAX, But memunmap_pages() get called for other pgmap
> > > types too. This results in the below warning when switching between
> > > system-ram and devdax mode for devdax namespace.
> > > 
> > >   jump label: negative count!
> > >   WARNING: CPU: 52 PID: 1335 at kernel/jump_label.c:235 static_key_slow_try_dec+0x88/0xa0
> > >   Modules linked in:
> > >   ....
> > > 
> > >   NIP [c000000000433318] static_key_slow_try_dec+0x88/0xa0
> > >   LR [c000000000433314] static_key_slow_try_dec+0x84/0xa0
> > >   Call Trace:
> > >   [c000000025c1f660] [c000000000433314] static_key_slow_try_dec+0x84/0xa0 (unreliable)
> > >   [c000000025c1f6d0] [c000000000433664] __static_key_slow_dec_cpuslocked+0x34/0xd0
> > >   [c000000025c1f700] [c0000000004337a4] static_key_slow_dec+0x54/0xf0
> > >   [c000000025c1f770] [c00000000059c49c] memunmap_pages+0x36c/0x500
> > >   [c000000025c1f820] [c000000000d91d10] devm_action_release+0x30/0x50
> > >   [c000000025c1f840] [c000000000d92e34] release_nodes+0x2f4/0x3e0
> > >   [c000000025c1f8f0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
> > >   [c000000025c1f930] [c000000000d883a4] bus_remove_device+0x124/0x210
> > >   [c000000025c1f9b0] [c000000000d80ef4] device_del+0x1d4/0x530
> > >   [c000000025c1fa70] [c000000000e341e8] unregister_dev_dax+0x48/0xe0
> > >   [c000000025c1fae0] [c000000000d91d10] devm_action_release+0x30/0x50
> > >   [c000000025c1fb00] [c000000000d92e34] release_nodes+0x2f4/0x3e0
> > >   [c000000025c1fbb0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
> > >   [c000000025c1fbf0] [c000000000d87000] unbind_store+0x130/0x170
> > >   [c000000025c1fc30] [c000000000d862a0] drv_attr_store+0x40/0x60
> > >   [c000000025c1fc50] [c0000000006d316c] sysfs_kf_write+0x6c/0xb0
> > >   [c000000025c1fc90] [c0000000006d2328] kernfs_fop_write+0x118/0x280
> > >   [c000000025c1fce0] [c0000000005a79f8] vfs_write+0xe8/0x2a0
> > >   [c000000025c1fd30] [c0000000005a7d94] ksys_write+0x84/0x140
> > >   [c000000025c1fd80] [c00000000003a430] system_call_exception+0x120/0x270
> > >   [c000000025c1fe20] [c00000000000c540] system_call_common+0xf0/0x27c
> > > 
> > > Cc: Christoph Hellwig <hch@infradead.org>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
> > > Cc: linux-nvdimm@lists.01.org
> > > Cc: Ira Weiny <ira.weiny@intel.com>
> > > Cc: Jason Gunthorpe <jgg@mellanox.com>
> > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > > ---
> > >   mm/memremap.c | 19 +++++++++++++++----
> > >   1 file changed, 15 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/mm/memremap.c b/mm/memremap.c
> > > index 73a206d0f645..d4402ff3e467 100644
> > > --- a/mm/memremap.c
> > > +++ b/mm/memremap.c
> > > @@ -158,6 +158,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
> > >   {
> > >   	unsigned long pfn;
> > >   	int i;
> > > +	bool need_devmap_managed = false;
> > > +
> > > +	switch (pgmap->type) {
> > > +	case MEMORY_DEVICE_PRIVATE:
> > > +	case MEMORY_DEVICE_FS_DAX:
> > > +		need_devmap_managed = true;
> > > +		break;
> > > +	default:
> > > +		break;
> > > +	}
> > 
> > Is it overkill to avoid duplicating this switch logic in
> > page_is_devmap_managed() by creating another call which can be used here?
> 
> Perhaps. I can imagine a helper defined in include/linux/mm.h which
> page_is_devmap_managed() could also call but that would impact a lot of
> places that include mm.h. Since memremap.c already has to have intimate
> knowledge of the pgmap->type, I think limiting the change to just what
> is needed is better for now. So the patch looks OK to me.
> 
> Looking at this some more, I would suggest changing devmap_managed_enable_get()
> and devmap_managed_enable_put() to do the special case checking instead of
> doing it in memremap_pages() and memunmap_pages().
> Then devmap_managed_enable_get() doesn't need to return an error if
> CONFIG_DEV_PAGEMAP_OPS isn't defined. I have only compile tested the
> following.

This looks cleaner to me.  Aneesh?

FWIW:
Reviewed-by: Ira Weiny <ira.weiny@intel.com>

Ira

> 
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> ---
>  mm/memremap.c | 39 ++++++++++++++++-----------------------
>  1 file changed, 16 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 73a206d0f645..16b2fb482da1 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -41,28 +41,24 @@ EXPORT_SYMBOL_GPL(memremap_compat_align);
>  DEFINE_STATIC_KEY_FALSE(devmap_managed_key);
>  EXPORT_SYMBOL(devmap_managed_key);
> -static void devmap_managed_enable_put(void)
> +static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
>  {
> -	static_branch_dec(&devmap_managed_key);
> +	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
> +	    pgmap->type == MEMORY_DEVICE_FS_DAX)
> +		static_branch_dec(&devmap_managed_key);
>  }
> -static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
> +static void devmap_managed_enable_get(struct dev_pagemap *pgmap)
>  {
> -	if (pgmap->type == MEMORY_DEVICE_PRIVATE &&
> -	    (!pgmap->ops || !pgmap->ops->page_free)) {
> -		WARN(1, "Missing page_free method\n");
> -		return -EINVAL;
> -	}
> -
> -	static_branch_inc(&devmap_managed_key);
> -	return 0;
> +	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
> +	    pgmap->type == MEMORY_DEVICE_FS_DAX)
> +		static_branch_inc(&devmap_managed_key);
>  }
>  #else
> -static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
> +static void devmap_managed_enable_get(struct dev_pagemap *pgmap)
>  {
> -	return -EINVAL;
>  }
> -static void devmap_managed_enable_put(void)
> +static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
>  {
>  }
>  #endif /* CONFIG_DEV_PAGEMAP_OPS */
> @@ -169,7 +165,7 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>  		pageunmap_range(pgmap, i);
>  	WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n");
> -	devmap_managed_enable_put();
> +	devmap_managed_enable_put(pgmap);
>  }
>  EXPORT_SYMBOL_GPL(memunmap_pages);
> @@ -307,7 +303,6 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>  		.pgprot = PAGE_KERNEL,
>  	};
>  	const int nr_range = pgmap->nr_range;
> -	bool need_devmap_managed = true;
>  	int error, i;
>  	if (WARN_ONCE(!nr_range, "nr_range must be specified\n"))
> @@ -323,6 +318,10 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>  			WARN(1, "Missing migrate_to_ram method\n");
>  			return ERR_PTR(-EINVAL);
>  		}
> +		if (!pgmap->ops->page_free) {
> +			WARN(1, "Missing page_free method\n");
> +			return ERR_PTR(-EINVAL);
> +		}
>  		if (!pgmap->owner) {
>  			WARN(1, "Missing owner\n");
>  			return ERR_PTR(-EINVAL);
> @@ -336,11 +335,9 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>  		}
>  		break;
>  	case MEMORY_DEVICE_GENERIC:
> -		need_devmap_managed = false;
>  		break;
>  	case MEMORY_DEVICE_PCI_P2PDMA:
>  		params.pgprot = pgprot_noncached(params.pgprot);
> -		need_devmap_managed = false;
>  		break;
>  	default:
>  		WARN(1, "Invalid pgmap type %d\n", pgmap->type);
> @@ -364,11 +361,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>  		}
>  	}
> -	if (need_devmap_managed) {
> -		error = devmap_managed_enable_get(pgmap);
> -		if (error)
> -			return ERR_PTR(error);
> -	}
> +	devmap_managed_enable_get(pgmap);
>  	/*
>  	 * Clear the pgmap nr_range as it will be incremented for each
> -- 
> 2.20.1
> 
> > >   	dev_pagemap_kill(pgmap);
> > >   	for (i = 0; i < pgmap->nr_range; i++)
> > > @@ -169,7 +179,8 @@ void memunmap_pages(struct dev_pagemap *pgmap)
> > >   		pageunmap_range(pgmap, i);
> > >   	WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n");
> > > -	devmap_managed_enable_put();
> > > +	if (need_devmap_managed)
> > > +		devmap_managed_enable_put();
> > >   }
> > >   EXPORT_SYMBOL_GPL(memunmap_pages);
> > > @@ -307,7 +318,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
> > >   		.pgprot = PAGE_KERNEL,
> > >   	};
> > >   	const int nr_range = pgmap->nr_range;
> > > -	bool need_devmap_managed = true;
> > > +	bool need_devmap_managed = false;
> > 
> > I'm CC'ing Ralph Campbell because I think some of his work has proposed this
> > same change.
> > 
> > Ira
> 
> This part of the patch isn't strictly needed, it just reverses the default value of
> need_devmap_managed.
> 
> > >   	int error, i;
> > >   	if (WARN_ONCE(!nr_range, "nr_range must be specified\n"))
> > > @@ -327,6 +338,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
> > >   			WARN(1, "Missing owner\n");
> > >   			return ERR_PTR(-EINVAL);
> > >   		}
> > > +		need_devmap_managed = true;
> > >   		break;
> > >   	case MEMORY_DEVICE_FS_DAX:
> > >   		if (!IS_ENABLED(CONFIG_ZONE_DEVICE) ||
> > > @@ -334,13 +346,12 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
> > >   			WARN(1, "File system DAX not supported\n");
> > >   			return ERR_PTR(-EINVAL);
> > >   		}
> > > +		need_devmap_managed = true;
> > >   		break;
> > >   	case MEMORY_DEVICE_GENERIC:
> > > -		need_devmap_managed = false;
> > >   		break;
> > >   	case MEMORY_DEVICE_PCI_P2PDMA:
> > >   		params.pgprot = pgprot_noncached(params.pgprot);
> > > -		need_devmap_managed = false;
> > >   		break;
> > >   	default:
> > >   		WARN(1, "Invalid pgmap type %d\n", pgmap->type);
> > > -- 
> > > 2.26.2
> > >
Aneesh Kumar K.V Oct. 23, 2020, 2:52 a.m. UTC | #6
On 10/23/20 12:40 AM, Ira Weiny wrote:
> On Thu, Oct 22, 2020 at 11:19:43AM -0700, Ralph Campbell wrote:
>>
>> On 10/22/20 8:41 AM, Ira Weiny wrote:
>>> On Thu, Oct 22, 2020 at 11:37:53AM +0530, Aneesh Kumar K.V wrote:
>>>> commit 6f42193fd86e ("memremap: don't use a separate devm action for
>>>> devmap_managed_enable_get") changed the static key updates such that we
>>>> now call devmap_managed_enable_put() without doing the equivalent
>>>> devmap_managed_enable_get().
>>>>
>>>> devmap_managed_enable_get() is only called for MEMORY_DEVICE_PRIVATE and
>>>> MEMORY_DEVICE_FS_DAX, But memunmap_pages() get called for other pgmap
>>>> types too. This results in the below warning when switching between
>>>> system-ram and devdax mode for devdax namespace.
>>>>
>>>>    jump label: negative count!
>>>>    WARNING: CPU: 52 PID: 1335 at kernel/jump_label.c:235 static_key_slow_try_dec+0x88/0xa0
>>>>    Modules linked in:
>>>>    ....
>>>>
>>>>    NIP [c000000000433318] static_key_slow_try_dec+0x88/0xa0
>>>>    LR [c000000000433314] static_key_slow_try_dec+0x84/0xa0
>>>>    Call Trace:
>>>>    [c000000025c1f660] [c000000000433314] static_key_slow_try_dec+0x84/0xa0 (unreliable)
>>>>    [c000000025c1f6d0] [c000000000433664] __static_key_slow_dec_cpuslocked+0x34/0xd0
>>>>    [c000000025c1f700] [c0000000004337a4] static_key_slow_dec+0x54/0xf0
>>>>    [c000000025c1f770] [c00000000059c49c] memunmap_pages+0x36c/0x500
>>>>    [c000000025c1f820] [c000000000d91d10] devm_action_release+0x30/0x50
>>>>    [c000000025c1f840] [c000000000d92e34] release_nodes+0x2f4/0x3e0
>>>>    [c000000025c1f8f0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
>>>>    [c000000025c1f930] [c000000000d883a4] bus_remove_device+0x124/0x210
>>>>    [c000000025c1f9b0] [c000000000d80ef4] device_del+0x1d4/0x530
>>>>    [c000000025c1fa70] [c000000000e341e8] unregister_dev_dax+0x48/0xe0
>>>>    [c000000025c1fae0] [c000000000d91d10] devm_action_release+0x30/0x50
>>>>    [c000000025c1fb00] [c000000000d92e34] release_nodes+0x2f4/0x3e0
>>>>    [c000000025c1fbb0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
>>>>    [c000000025c1fbf0] [c000000000d87000] unbind_store+0x130/0x170
>>>>    [c000000025c1fc30] [c000000000d862a0] drv_attr_store+0x40/0x60
>>>>    [c000000025c1fc50] [c0000000006d316c] sysfs_kf_write+0x6c/0xb0
>>>>    [c000000025c1fc90] [c0000000006d2328] kernfs_fop_write+0x118/0x280
>>>>    [c000000025c1fce0] [c0000000005a79f8] vfs_write+0xe8/0x2a0
>>>>    [c000000025c1fd30] [c0000000005a7d94] ksys_write+0x84/0x140
>>>>    [c000000025c1fd80] [c00000000003a430] system_call_exception+0x120/0x270
>>>>    [c000000025c1fe20] [c00000000000c540] system_call_common+0xf0/0x27c
>>>>
>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>>> Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>>> Cc: linux-nvdimm@lists.01.org
>>>> Cc: Ira Weiny <ira.weiny@intel.com>
>>>> Cc: Jason Gunthorpe <jgg@mellanox.com>
>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> ---
>>>>    mm/memremap.c | 19 +++++++++++++++----
>>>>    1 file changed, 15 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>> index 73a206d0f645..d4402ff3e467 100644
>>>> --- a/mm/memremap.c
>>>> +++ b/mm/memremap.c
>>>> @@ -158,6 +158,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>>>    {
>>>>    	unsigned long pfn;
>>>>    	int i;
>>>> +	bool need_devmap_managed = false;
>>>> +
>>>> +	switch (pgmap->type) {
>>>> +	case MEMORY_DEVICE_PRIVATE:
>>>> +	case MEMORY_DEVICE_FS_DAX:
>>>> +		need_devmap_managed = true;
>>>> +		break;
>>>> +	default:
>>>> +		break;
>>>> +	}
>>>
>>> Is it overkill to avoid duplicating this switch logic in
>>> page_is_devmap_managed() by creating another call which can be used here?
>>
>> Perhaps. I can imagine a helper defined in include/linux/mm.h which
>> page_is_devmap_managed() could also call but that would impact a lot of
>> places that include mm.h. Since memremap.c already has to have intimate
>> knowledge of the pgmap->type, I think limiting the change to just what
>> is needed is better for now. So the patch looks OK to me.
>>
>> Looking at this some more, I would suggest changing devmap_managed_enable_get()
>> and devmap_managed_enable_put() to do the special case checking instead of
>> doing it in memremap_pages() and memunmap_pages().
>> Then devmap_managed_enable_get() doesn't need to return an error if
>> CONFIG_DEV_PAGEMAP_OPS isn't defined. I have only compile tested the
>> following.
> 
> This looks cleaner to me.  Aneesh?
> 
> FWIW:
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> 
>

Yes. You can add

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>


-aneesh
Sachin Sant Oct. 23, 2020, 6:38 a.m. UTC | #7
>> Is it overkill to avoid duplicating this switch logic in
>> page_is_devmap_managed() by creating another call which can be used here?
> 
> Perhaps. I can imagine a helper defined in include/linux/mm.h which
> page_is_devmap_managed() could also call but that would impact a lot of
> places that include mm.h. Since memremap.c already has to have intimate
> knowledge of the pgmap->type, I think limiting the change to just what
> is needed is better for now. So the patch looks OK to me.
> 
> Looking at this some more, I would suggest changing devmap_managed_enable_get()
> and devmap_managed_enable_put() to do the special case checking instead of
> doing it in memremap_pages() and memunmap_pages().
> Then devmap_managed_enable_get() doesn't need to return an error if
> CONFIG_DEV_PAGEMAP_OPS isn't defined. I have only compile tested the
> following.
> 
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> —

This patch fixes the warning for me.

Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>

-Sachin
Christoph Hellwig Oct. 23, 2020, 6:46 a.m. UTC | #8
On Thu, Oct 22, 2020 at 11:19:43AM -0700, Ralph Campbell wrote:
> 
> On 10/22/20 8:41 AM, Ira Weiny wrote:
> > On Thu, Oct 22, 2020 at 11:37:53AM +0530, Aneesh Kumar K.V wrote:
> > > commit 6f42193fd86e ("memremap: don't use a separate devm action for
> > > devmap_managed_enable_get") changed the static key updates such that we
> > > now call devmap_managed_enable_put() without doing the equivalent
> > > devmap_managed_enable_get().
> > > 
> > > devmap_managed_enable_get() is only called for MEMORY_DEVICE_PRIVATE and
> > > MEMORY_DEVICE_FS_DAX, But memunmap_pages() get called for other pgmap
> > > types too. This results in the below warning when switching between
> > > system-ram and devdax mode for devdax namespace.
> > > 
> > >   jump label: negative count!
> > >   WARNING: CPU: 52 PID: 1335 at kernel/jump_label.c:235 static_key_slow_try_dec+0x88/0xa0
> > >   Modules linked in:
> > >   ....
> > > 
> > >   NIP [c000000000433318] static_key_slow_try_dec+0x88/0xa0
> > >   LR [c000000000433314] static_key_slow_try_dec+0x84/0xa0
> > >   Call Trace:
> > >   [c000000025c1f660] [c000000000433314] static_key_slow_try_dec+0x84/0xa0 (unreliable)
> > >   [c000000025c1f6d0] [c000000000433664] __static_key_slow_dec_cpuslocked+0x34/0xd0
> > >   [c000000025c1f700] [c0000000004337a4] static_key_slow_dec+0x54/0xf0
> > >   [c000000025c1f770] [c00000000059c49c] memunmap_pages+0x36c/0x500
> > >   [c000000025c1f820] [c000000000d91d10] devm_action_release+0x30/0x50
> > >   [c000000025c1f840] [c000000000d92e34] release_nodes+0x2f4/0x3e0
> > >   [c000000025c1f8f0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
> > >   [c000000025c1f930] [c000000000d883a4] bus_remove_device+0x124/0x210
> > >   [c000000025c1f9b0] [c000000000d80ef4] device_del+0x1d4/0x530
> > >   [c000000025c1fa70] [c000000000e341e8] unregister_dev_dax+0x48/0xe0
> > >   [c000000025c1fae0] [c000000000d91d10] devm_action_release+0x30/0x50
> > >   [c000000025c1fb00] [c000000000d92e34] release_nodes+0x2f4/0x3e0
> > >   [c000000025c1fbb0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
> > >   [c000000025c1fbf0] [c000000000d87000] unbind_store+0x130/0x170
> > >   [c000000025c1fc30] [c000000000d862a0] drv_attr_store+0x40/0x60
> > >   [c000000025c1fc50] [c0000000006d316c] sysfs_kf_write+0x6c/0xb0
> > >   [c000000025c1fc90] [c0000000006d2328] kernfs_fop_write+0x118/0x280
> > >   [c000000025c1fce0] [c0000000005a79f8] vfs_write+0xe8/0x2a0
> > >   [c000000025c1fd30] [c0000000005a7d94] ksys_write+0x84/0x140
> > >   [c000000025c1fd80] [c00000000003a430] system_call_exception+0x120/0x270
> > >   [c000000025c1fe20] [c00000000000c540] system_call_common+0xf0/0x27c
> > > 
> > > Cc: Christoph Hellwig <hch@infradead.org>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
> > > Cc: linux-nvdimm@lists.01.org
> > > Cc: Ira Weiny <ira.weiny@intel.com>
> > > Cc: Jason Gunthorpe <jgg@mellanox.com>
> > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > > ---
> > >   mm/memremap.c | 19 +++++++++++++++----
> > >   1 file changed, 15 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/mm/memremap.c b/mm/memremap.c
> > > index 73a206d0f645..d4402ff3e467 100644
> > > --- a/mm/memremap.c
> > > +++ b/mm/memremap.c
> > > @@ -158,6 +158,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
> > >   {
> > >   	unsigned long pfn;
> > >   	int i;
> > > +	bool need_devmap_managed = false;
> > > +
> > > +	switch (pgmap->type) {
> > > +	case MEMORY_DEVICE_PRIVATE:
> > > +	case MEMORY_DEVICE_FS_DAX:
> > > +		need_devmap_managed = true;
> > > +		break;
> > > +	default:
> > > +		break;
> > > +	}
> > 
> > Is it overkill to avoid duplicating this switch logic in
> > page_is_devmap_managed() by creating another call which can be used here?
> 
> Perhaps. I can imagine a helper defined in include/linux/mm.h which
> page_is_devmap_managed() could also call but that would impact a lot of
> places that include mm.h. Since memremap.c already has to have intimate
> knowledge of the pgmap->type, I think limiting the change to just what
> is needed is better for now. So the patch looks OK to me.
> 
> Looking at this some more, I would suggest changing devmap_managed_enable_get()
> and devmap_managed_enable_put() to do the special case checking instead of
> doing it in memremap_pages() and memunmap_pages().
> Then devmap_managed_enable_get() doesn't need to return an error if
> CONFIG_DEV_PAGEMAP_OPS isn't defined. I have only compile tested the
> following.
> 
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>

This looks ok as well.  Can you submit it as a proper standalone patch?
Ralph Campbell Oct. 23, 2020, 5:29 p.m. UTC | #9
On 10/22/20 11:46 PM, Christoph Hellwig wrote:
> On Thu, Oct 22, 2020 at 11:19:43AM -0700, Ralph Campbell wrote:
>>
>> On 10/22/20 8:41 AM, Ira Weiny wrote:
>>> On Thu, Oct 22, 2020 at 11:37:53AM +0530, Aneesh Kumar K.V wrote:
>>>> commit 6f42193fd86e ("memremap: don't use a separate devm action for
>>>> devmap_managed_enable_get") changed the static key updates such that we
>>>> now call devmap_managed_enable_put() without doing the equivalent
>>>> devmap_managed_enable_get().
>>>>
>>>> devmap_managed_enable_get() is only called for MEMORY_DEVICE_PRIVATE and
>>>> MEMORY_DEVICE_FS_DAX, But memunmap_pages() get called for other pgmap
>>>> types too. This results in the below warning when switching between
>>>> system-ram and devdax mode for devdax namespace.
>>>>
>>>>    jump label: negative count!
>>>>    WARNING: CPU: 52 PID: 1335 at kernel/jump_label.c:235 static_key_slow_try_dec+0x88/0xa0
>>>>    Modules linked in:
>>>>    ....
>>>>
>>>>    NIP [c000000000433318] static_key_slow_try_dec+0x88/0xa0
>>>>    LR [c000000000433314] static_key_slow_try_dec+0x84/0xa0
>>>>    Call Trace:
>>>>    [c000000025c1f660] [c000000000433314] static_key_slow_try_dec+0x84/0xa0 (unreliable)
>>>>    [c000000025c1f6d0] [c000000000433664] __static_key_slow_dec_cpuslocked+0x34/0xd0
>>>>    [c000000025c1f700] [c0000000004337a4] static_key_slow_dec+0x54/0xf0
>>>>    [c000000025c1f770] [c00000000059c49c] memunmap_pages+0x36c/0x500
>>>>    [c000000025c1f820] [c000000000d91d10] devm_action_release+0x30/0x50
>>>>    [c000000025c1f840] [c000000000d92e34] release_nodes+0x2f4/0x3e0
>>>>    [c000000025c1f8f0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
>>>>    [c000000025c1f930] [c000000000d883a4] bus_remove_device+0x124/0x210
>>>>    [c000000025c1f9b0] [c000000000d80ef4] device_del+0x1d4/0x530
>>>>    [c000000025c1fa70] [c000000000e341e8] unregister_dev_dax+0x48/0xe0
>>>>    [c000000025c1fae0] [c000000000d91d10] devm_action_release+0x30/0x50
>>>>    [c000000025c1fb00] [c000000000d92e34] release_nodes+0x2f4/0x3e0
>>>>    [c000000025c1fbb0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
>>>>    [c000000025c1fbf0] [c000000000d87000] unbind_store+0x130/0x170
>>>>    [c000000025c1fc30] [c000000000d862a0] drv_attr_store+0x40/0x60
>>>>    [c000000025c1fc50] [c0000000006d316c] sysfs_kf_write+0x6c/0xb0
>>>>    [c000000025c1fc90] [c0000000006d2328] kernfs_fop_write+0x118/0x280
>>>>    [c000000025c1fce0] [c0000000005a79f8] vfs_write+0xe8/0x2a0
>>>>    [c000000025c1fd30] [c0000000005a7d94] ksys_write+0x84/0x140
>>>>    [c000000025c1fd80] [c00000000003a430] system_call_exception+0x120/0x270
>>>>    [c000000025c1fe20] [c00000000000c540] system_call_common+0xf0/0x27c
>>>>
>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>>> Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>>> Cc: linux-nvdimm@lists.01.org
>>>> Cc: Ira Weiny <ira.weiny@intel.com>
>>>> Cc: Jason Gunthorpe <jgg@mellanox.com>
>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> ---
>>>>    mm/memremap.c | 19 +++++++++++++++----
>>>>    1 file changed, 15 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>> index 73a206d0f645..d4402ff3e467 100644
>>>> --- a/mm/memremap.c
>>>> +++ b/mm/memremap.c
>>>> @@ -158,6 +158,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>>>    {
>>>>    	unsigned long pfn;
>>>>    	int i;
>>>> +	bool need_devmap_managed = false;
>>>> +
>>>> +	switch (pgmap->type) {
>>>> +	case MEMORY_DEVICE_PRIVATE:
>>>> +	case MEMORY_DEVICE_FS_DAX:
>>>> +		need_devmap_managed = true;
>>>> +		break;
>>>> +	default:
>>>> +		break;
>>>> +	}
>>>
>>> Is it overkill to avoid duplicating this switch logic in
>>> page_is_devmap_managed() by creating another call which can be used here?
>>
>> Perhaps. I can imagine a helper defined in include/linux/mm.h which
>> page_is_devmap_managed() could also call but that would impact a lot of
>> places that include mm.h. Since memremap.c already has to have intimate
>> knowledge of the pgmap->type, I think limiting the change to just what
>> is needed is better for now. So the patch looks OK to me.
>>
>> Looking at this some more, I would suggest changing devmap_managed_enable_get()
>> and devmap_managed_enable_put() to do the special case checking instead of
>> doing it in memremap_pages() and memunmap_pages().
>> Then devmap_managed_enable_get() doesn't need to return an error if
>> CONFIG_DEV_PAGEMAP_OPS isn't defined. I have only compile tested the
>> following.
>>
>> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> 
> This looks ok as well.  Can you submit it as a proper standalone patch?
> 

Yes. I was just about to ask whether I should do that and you beat me to it.
diff mbox series

Patch

diff --git a/mm/memremap.c b/mm/memremap.c
index 73a206d0f645..d4402ff3e467 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -158,6 +158,16 @@  void memunmap_pages(struct dev_pagemap *pgmap)
 {
 	unsigned long pfn;
 	int i;
+	bool need_devmap_managed = false;
+
+	switch (pgmap->type) {
+	case MEMORY_DEVICE_PRIVATE:
+	case MEMORY_DEVICE_FS_DAX:
+		need_devmap_managed = true;
+		break;
+	default:
+		break;
+	}
 
 	dev_pagemap_kill(pgmap);
 	for (i = 0; i < pgmap->nr_range; i++)
@@ -169,7 +179,8 @@  void memunmap_pages(struct dev_pagemap *pgmap)
 		pageunmap_range(pgmap, i);
 
 	WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n");
-	devmap_managed_enable_put();
+	if (need_devmap_managed)
+		devmap_managed_enable_put();
 }
 EXPORT_SYMBOL_GPL(memunmap_pages);
 
@@ -307,7 +318,7 @@  void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 		.pgprot = PAGE_KERNEL,
 	};
 	const int nr_range = pgmap->nr_range;
-	bool need_devmap_managed = true;
+	bool need_devmap_managed = false;
 	int error, i;
 
 	if (WARN_ONCE(!nr_range, "nr_range must be specified\n"))
@@ -327,6 +338,7 @@  void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 			WARN(1, "Missing owner\n");
 			return ERR_PTR(-EINVAL);
 		}
+		need_devmap_managed = true;
 		break;
 	case MEMORY_DEVICE_FS_DAX:
 		if (!IS_ENABLED(CONFIG_ZONE_DEVICE) ||
@@ -334,13 +346,12 @@  void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 			WARN(1, "File system DAX not supported\n");
 			return ERR_PTR(-EINVAL);
 		}
+		need_devmap_managed = true;
 		break;
 	case MEMORY_DEVICE_GENERIC:
-		need_devmap_managed = false;
 		break;
 	case MEMORY_DEVICE_PCI_P2PDMA:
 		params.pgprot = pgprot_noncached(params.pgprot);
-		need_devmap_managed = false;
 		break;
 	default:
 		WARN(1, "Invalid pgmap type %d\n", pgmap->type);