diff mbox series

mm: Cleanup __put_devmap_managed_page() vs ->page_free()

Message ID 157368992671.2974225.13512647385398246617.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State Superseded
Headers show
Series mm: Cleanup __put_devmap_managed_page() vs ->page_free() | expand

Commit Message

Dan Williams Nov. 14, 2019, 12:07 a.m. UTC
After the removal of the device-public infrastructure there are only 2
->page_free() call backs in the kernel. One of those is a device-private
callback in the nouveau driver, the other is a generic wakeup needed in
the DAX case. In the hopes that all ->page_free() callbacks can be
migrated to common core kernel functionality, move the device-private
specific actions in __put_devmap_managed_page() under the
is_device_private_page() conditional, including the ->page_free()
callback. For the other page types just open-code the generic wakeup.

Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
case.

Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Hi John,

This applies on top of today's linux-next and passes my nvdimm unit
tests. That testing noticed that devmap_managed_enable_get() needed a
small fixup as well.

 drivers/nvdimm/pmem.c |    6 ------
 mm/memremap.c         |   22 ++++++++++++----------
 2 files changed, 12 insertions(+), 16 deletions(-)

Comments

John Hubbard Nov. 14, 2019, 12:39 a.m. UTC | #1
On 11/13/19 4:07 PM, Dan Williams wrote:
> After the removal of the device-public infrastructure there are only 2
> ->page_free() call backs in the kernel. One of those is a device-private
> callback in the nouveau driver, the other is a generic wakeup needed in
> the DAX case. In the hopes that all ->page_free() callbacks can be
> migrated to common core kernel functionality, move the device-private
> specific actions in __put_devmap_managed_page() under the
> is_device_private_page() conditional, including the ->page_free()
> callback. For the other page types just open-code the generic wakeup.
> 
> Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
> does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
> case.
> 
> Cc: Jan Kara <jack@suse.cz>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> Hi John,
> 
> This applies on top of today's linux-next and passes my nvdimm unit
> tests. That testing noticed that devmap_managed_enable_get() needed a
> small fixup as well.

Got it. This will appear in the next posted version of my "mm/gup: track
dma-pinned pages: FOLL_PIN, FOLL_LONGTERM" patchset.


> 
>   drivers/nvdimm/pmem.c |    6 ------
>   mm/memremap.c         |   22 ++++++++++++----------
>   2 files changed, 12 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index f9f76f6ba07b..21db1ce8c0ae 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem)
>   	put_disk(pmem->disk);
>   }
>   
> -static void pmem_pagemap_page_free(struct page *page)
> -{
> -	wake_up_var(&page->_refcount);
> -}
> -
>   static const struct dev_pagemap_ops fsdax_pagemap_ops = {
> -	.page_free		= pmem_pagemap_page_free,
>   	.kill			= pmem_pagemap_kill,
>   	.cleanup		= pmem_pagemap_cleanup,
>   };
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 022e78e68ea0..6e6f3d6fdb73 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -27,7 +27,8 @@ static void devmap_managed_enable_put(void)
>   
>   static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
>   {
> -	if (!pgmap->ops || !pgmap->ops->page_free) {
> +	if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE
> +				&& !pgmap->ops->page_free)) {


OK, so only MEMORY_DEVICE_PRIVATE has .page_free ops. That looks
correct to me, based on looking at the .page_free setters--I
only see Nouveau setting it.


thanks,
Dan Williams Nov. 14, 2019, 12:47 a.m. UTC | #2
On Wed, Nov 13, 2019 at 4:42 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 11/13/19 4:07 PM, Dan Williams wrote:
> > After the removal of the device-public infrastructure there are only 2
> > ->page_free() call backs in the kernel. One of those is a device-private
> > callback in the nouveau driver, the other is a generic wakeup needed in
> > the DAX case. In the hopes that all ->page_free() callbacks can be
> > migrated to common core kernel functionality, move the device-private
> > specific actions in __put_devmap_managed_page() under the
> > is_device_private_page() conditional, including the ->page_free()
> > callback. For the other page types just open-code the generic wakeup.
> >
> > Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
> > does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
> > case.
> >
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> > Hi John,
> >
> > This applies on top of today's linux-next and passes my nvdimm unit
> > tests. That testing noticed that devmap_managed_enable_get() needed a
> > small fixup as well.
>
> Got it. This will appear in the next posted version of my "mm/gup: track
> dma-pinned pages: FOLL_PIN, FOLL_LONGTERM" patchset.

Thanks!

>
>
> >
> >   drivers/nvdimm/pmem.c |    6 ------
> >   mm/memremap.c         |   22 ++++++++++++----------
> >   2 files changed, 12 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> > index f9f76f6ba07b..21db1ce8c0ae 100644
> > --- a/drivers/nvdimm/pmem.c
> > +++ b/drivers/nvdimm/pmem.c
> > @@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem)
> >       put_disk(pmem->disk);
> >   }
> >
> > -static void pmem_pagemap_page_free(struct page *page)
> > -{
> > -     wake_up_var(&page->_refcount);
> > -}
> > -
> >   static const struct dev_pagemap_ops fsdax_pagemap_ops = {
> > -     .page_free              = pmem_pagemap_page_free,
> >       .kill                   = pmem_pagemap_kill,
> >       .cleanup                = pmem_pagemap_cleanup,
> >   };
> > diff --git a/mm/memremap.c b/mm/memremap.c
> > index 022e78e68ea0..6e6f3d6fdb73 100644
> > --- a/mm/memremap.c
> > +++ b/mm/memremap.c
> > @@ -27,7 +27,8 @@ static void devmap_managed_enable_put(void)
> >
> >   static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
> >   {
> > -     if (!pgmap->ops || !pgmap->ops->page_free) {
> > +     if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE
> > +                             && !pgmap->ops->page_free)) {
>
>
> OK, so only MEMORY_DEVICE_PRIVATE has .page_free ops. That looks
> correct to me, based on looking at the .page_free setters--I
> only see Nouveau setting it.

Correct. The FSDAX case still needs to enable the 'devmap_managed_key'
static key, but other than that the core will handle all the follow-on
details.
Jerome Glisse Nov. 14, 2019, 1:24 a.m. UTC | #3
On Wed, Nov 13, 2019 at 04:07:22PM -0800, Dan Williams wrote:
> After the removal of the device-public infrastructure there are only 2
> ->page_free() call backs in the kernel. One of those is a device-private
> callback in the nouveau driver, the other is a generic wakeup needed in
> the DAX case. In the hopes that all ->page_free() callbacks can be
> migrated to common core kernel functionality, move the device-private
> specific actions in __put_devmap_managed_page() under the
> is_device_private_page() conditional, including the ->page_free()
> callback. For the other page types just open-code the generic wakeup.
> 
> Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
> does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
> case.
> 
> Cc: Jan Kara <jack@suse.cz>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

All looks good to me.

Reviewed-by: Jérôme Glisse <jglisse@redhat.com>


> ---
> Hi John,
> 
> This applies on top of today's linux-next and passes my nvdimm unit
> tests. That testing noticed that devmap_managed_enable_get() needed a
> small fixup as well.
> 
>  drivers/nvdimm/pmem.c |    6 ------
>  mm/memremap.c         |   22 ++++++++++++----------
>  2 files changed, 12 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index f9f76f6ba07b..21db1ce8c0ae 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem)
>  	put_disk(pmem->disk);
>  }
>  
> -static void pmem_pagemap_page_free(struct page *page)
> -{
> -	wake_up_var(&page->_refcount);
> -}
> -
>  static const struct dev_pagemap_ops fsdax_pagemap_ops = {
> -	.page_free		= pmem_pagemap_page_free,
>  	.kill			= pmem_pagemap_kill,
>  	.cleanup		= pmem_pagemap_cleanup,
>  };
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 022e78e68ea0..6e6f3d6fdb73 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -27,7 +27,8 @@ static void devmap_managed_enable_put(void)
>  
>  static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
>  {
> -	if (!pgmap->ops || !pgmap->ops->page_free) {
> +	if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE
> +				&& !pgmap->ops->page_free)) {
>  		WARN(1, "Missing page_free method\n");
>  		return -EINVAL;
>  	}
> @@ -449,12 +450,6 @@ void __put_devmap_managed_page(struct page *page)
>  	 * holds a reference on the page.
>  	 */
>  	if (count == 1) {
> -		/* Clear Active bit in case of parallel mark_page_accessed */
> -		__ClearPageActive(page);
> -		__ClearPageWaiters(page);
> -
> -		mem_cgroup_uncharge(page);
> -
>  		/*
>  		 * When a device_private page is freed, the page->mapping field
>  		 * may still contain a (stale) mapping value. For example, the
> @@ -476,10 +471,17 @@ void __put_devmap_managed_page(struct page *page)
>  		 * handled differently or not done at all, so there is no need
>  		 * to clear page->mapping.
>  		 */
> -		if (is_device_private_page(page))
> -			page->mapping = NULL;
> +		if (is_device_private_page(page)) {
> +			/* Clear Active bit in case of parallel mark_page_accessed */
> +			__ClearPageActive(page);
> +			__ClearPageWaiters(page);
>  
> -		page->pgmap->ops->page_free(page);
> +			mem_cgroup_uncharge(page);
> +
> +			page->mapping = NULL;
> +			page->pgmap->ops->page_free(page);
> +		} else
> +			wake_up_var(&page->_refcount);
>  	} else if (!count)
>  		__put_page(page);
>  }
>
Christoph Hellwig Nov. 14, 2019, 7:19 a.m. UTC | #4
On Wed, Nov 13, 2019 at 04:07:22PM -0800, Dan Williams wrote:
>  static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
>  {
> -	if (!pgmap->ops || !pgmap->ops->page_free) {
> +	if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE
> +				&& !pgmap->ops->page_free)) {

I don't think this check is correct.  You only want the the ops null check
or MEMORY_DEVICE_PRIVATE as well now, i.e.:

	if (pgmap->type == MEMORY_DEVICE_PRIVATE &&
	    (!pgmap->ops || !pgmap->ops->page_free)) {

> @@ -476,10 +471,17 @@ void __put_devmap_managed_page(struct page *page)
>  		 * handled differently or not done at all, so there is no need
>  		 * to clear page->mapping.
>  		 */
> -		if (is_device_private_page(page))
> -			page->mapping = NULL;
> +		if (is_device_private_page(page)) {
> +			/* Clear Active bit in case of parallel mark_page_accessed */

This adds a > 80 char line.  But that whole flow of the function seems
rather odd now.

Why can't we do:

	if (count == 0) {
		__put_page(page);
	} else if (is_device_private_page(page)) {
		__ClearPageActive(page);
		__ClearPageWaiters(page);

		mem_cgroup_uncharge(page);
		page->mapping = NULL;
		page->pgmap->ops->page_free(page);
	} else {
		wake_up_var(&page->_refcount);
	}

(except for the fact that I don't get the point of calling __put_page
on a refcount of zero, but that is separate from this patch).
Christoph Hellwig Nov. 14, 2019, 7:23 a.m. UTC | #5
On Wed, Nov 13, 2019 at 04:47:38PM -0800, Dan Williams wrote:
> > Got it. This will appear in the next posted version of my "mm/gup: track
> > dma-pinned pages: FOLL_PIN, FOLL_LONGTERM" patchset.
> 
> Thanks!

John - can you please send a small series just doing the zone device
patches rework?  That way we can review it separately and maybe even get
it into 5.5.
Dan Williams Nov. 14, 2019, 7:25 a.m. UTC | #6
On Wed, Nov 13, 2019 at 11:19 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Nov 13, 2019 at 04:07:22PM -0800, Dan Williams wrote:
> >  static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
> >  {
> > -     if (!pgmap->ops || !pgmap->ops->page_free) {
> > +     if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE
> > +                             && !pgmap->ops->page_free)) {
>
> I don't think this check is correct.  You only want the the ops null check
> or MEMORY_DEVICE_PRIVATE as well now, i.e.:
>
>         if (pgmap->type == MEMORY_DEVICE_PRIVATE &&
>             (!pgmap->ops || !pgmap->ops->page_free)) {
>
> > @@ -476,10 +471,17 @@ void __put_devmap_managed_page(struct page *page)
> >                * handled differently or not done at all, so there is no need
> >                * to clear page->mapping.
> >                */
> > -             if (is_device_private_page(page))
> > -                     page->mapping = NULL;
> > +             if (is_device_private_page(page)) {
> > +                     /* Clear Active bit in case of parallel mark_page_accessed */
>
> This adds a > 80 char line.  But that whole flow of the function seems
> rather odd now.
>
> Why can't we do:
>
>         if (count == 0) {
>                 __put_page(page);
>         } else if (is_device_private_page(page)) {
>                 __ClearPageActive(page);
>                 __ClearPageWaiters(page);
>
>                 mem_cgroup_uncharge(page);
>                 page->mapping = NULL;
>                 page->pgmap->ops->page_free(page);
>         } else {
>                 wake_up_var(&page->_refcount);
>         }
>

All the above looks good to me will spin a v2.

> (except for the fact that I don't get the point of calling __put_page
> on a refcount of zero, but that is separate from this patch).

That looked odd to me as well until I recalled that we did that to
simplify the pgmap reference counting.

71389703839e mm, zone_device: Replace {get, put}_zone_device_page()
with a single reference to fix pmem crash

I'll add a comment in v2.
John Hubbard Nov. 14, 2019, 7:26 a.m. UTC | #7
On 11/13/19 11:23 PM, Christoph Hellwig wrote:
> On Wed, Nov 13, 2019 at 04:47:38PM -0800, Dan Williams wrote:
>>> Got it. This will appear in the next posted version of my "mm/gup: track
>>> dma-pinned pages: FOLL_PIN, FOLL_LONGTERM" patchset.
>>
>> Thanks!
> 
> John - can you please send a small series just doing the zone device
> patches rework?  That way we can review it separately and maybe even get
> it into 5.5.
> 

Sure.


thanks,
diff mbox series

Patch

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f9f76f6ba07b..21db1ce8c0ae 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -338,13 +338,7 @@  static void pmem_release_disk(void *__pmem)
 	put_disk(pmem->disk);
 }
 
-static void pmem_pagemap_page_free(struct page *page)
-{
-	wake_up_var(&page->_refcount);
-}
-
 static const struct dev_pagemap_ops fsdax_pagemap_ops = {
-	.page_free		= pmem_pagemap_page_free,
 	.kill			= pmem_pagemap_kill,
 	.cleanup		= pmem_pagemap_cleanup,
 };
diff --git a/mm/memremap.c b/mm/memremap.c
index 022e78e68ea0..6e6f3d6fdb73 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -27,7 +27,8 @@  static void devmap_managed_enable_put(void)
 
 static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
 {
-	if (!pgmap->ops || !pgmap->ops->page_free) {
+	if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE
+				&& !pgmap->ops->page_free)) {
 		WARN(1, "Missing page_free method\n");
 		return -EINVAL;
 	}
@@ -449,12 +450,6 @@  void __put_devmap_managed_page(struct page *page)
 	 * holds a reference on the page.
 	 */
 	if (count == 1) {
-		/* Clear Active bit in case of parallel mark_page_accessed */
-		__ClearPageActive(page);
-		__ClearPageWaiters(page);
-
-		mem_cgroup_uncharge(page);
-
 		/*
 		 * When a device_private page is freed, the page->mapping field
 		 * may still contain a (stale) mapping value. For example, the
@@ -476,10 +471,17 @@  void __put_devmap_managed_page(struct page *page)
 		 * handled differently or not done at all, so there is no need
 		 * to clear page->mapping.
 		 */
-		if (is_device_private_page(page))
-			page->mapping = NULL;
+		if (is_device_private_page(page)) {
+			/* Clear Active bit in case of parallel mark_page_accessed */
+			__ClearPageActive(page);
+			__ClearPageWaiters(page);
 
-		page->pgmap->ops->page_free(page);
+			mem_cgroup_uncharge(page);
+
+			page->mapping = NULL;
+			page->pgmap->ops->page_free(page);
+		} else
+			wake_up_var(&page->_refcount);
 	} else if (!count)
 		__put_page(page);
 }