Message ID | 157368992671.2974225.13512647385398246617.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: Cleanup __put_devmap_managed_page() vs ->page_free() | expand |
On 11/13/19 4:07 PM, Dan Williams wrote: > After the removal of the device-public infrastructure there are only 2 > ->page_free() call backs in the kernel. One of those is a device-private > callback in the nouveau driver, the other is a generic wakeup needed in > the DAX case. In the hopes that all ->page_free() callbacks can be > migrated to common core kernel functionality, move the device-private > specific actions in __put_devmap_managed_page() under the > is_device_private_page() conditional, including the ->page_free() > callback. For the other page types just open-code the generic wakeup. > > Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it > does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA > case. > > Cc: Jan Kara <jack@suse.cz> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Ira Weiny <ira.weiny@intel.com> > Cc: Jérôme Glisse <jglisse@redhat.com> > Cc: John Hubbard <jhubbard@nvidia.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > Hi John, > > This applies on top of today's linux-next and passes my nvdimm unit > tests. That testing noticed that devmap_managed_enable_get() needed a > small fixup as well. Got it. This will appear in the next posted version of my "mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM" patchset. > > drivers/nvdimm/pmem.c | 6 ------ > mm/memremap.c | 22 ++++++++++++---------- > 2 files changed, 12 insertions(+), 16 deletions(-) > > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c > index f9f76f6ba07b..21db1ce8c0ae 100644 > --- a/drivers/nvdimm/pmem.c > +++ b/drivers/nvdimm/pmem.c > @@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem) > put_disk(pmem->disk); > } > > -static void pmem_pagemap_page_free(struct page *page) > -{ > - wake_up_var(&page->_refcount); > -} > - > static const struct dev_pagemap_ops fsdax_pagemap_ops = { > - .page_free = pmem_pagemap_page_free, > .kill = pmem_pagemap_kill, > .cleanup = pmem_pagemap_cleanup, > }; > diff --git a/mm/memremap.c b/mm/memremap.c > index 022e78e68ea0..6e6f3d6fdb73 100644 > --- a/mm/memremap.c > +++ b/mm/memremap.c > @@ -27,7 +27,8 @@ static void devmap_managed_enable_put(void) > > static int devmap_managed_enable_get(struct dev_pagemap *pgmap) > { > - if (!pgmap->ops || !pgmap->ops->page_free) { > + if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE > + && !pgmap->ops->page_free)) { OK, so only MEMORY_DEVICE_PRIVATE has .page_free ops. That looks correct to me, based on looking at the .page_free setters--I only see Nouveau setting it. thanks,
On Wed, Nov 13, 2019 at 4:42 PM John Hubbard <jhubbard@nvidia.com> wrote: > > On 11/13/19 4:07 PM, Dan Williams wrote: > > After the removal of the device-public infrastructure there are only 2 > > ->page_free() call backs in the kernel. One of those is a device-private > > callback in the nouveau driver, the other is a generic wakeup needed in > > the DAX case. In the hopes that all ->page_free() callbacks can be > > migrated to common core kernel functionality, move the device-private > > specific actions in __put_devmap_managed_page() under the > > is_device_private_page() conditional, including the ->page_free() > > callback. For the other page types just open-code the generic wakeup. > > > > Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it > > does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA > > case. > > > > Cc: Jan Kara <jack@suse.cz> > > Cc: Christoph Hellwig <hch@lst.de> > > Cc: Ira Weiny <ira.weiny@intel.com> > > Cc: Jérôme Glisse <jglisse@redhat.com> > > Cc: John Hubbard <jhubbard@nvidia.com> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > --- > > Hi John, > > > > This applies on top of today's linux-next and passes my nvdimm unit > > tests. That testing noticed that devmap_managed_enable_get() needed a > > small fixup as well. > > Got it. This will appear in the next posted version of my "mm/gup: track > dma-pinned pages: FOLL_PIN, FOLL_LONGTERM" patchset. Thanks! > > > > > > drivers/nvdimm/pmem.c | 6 ------ > > mm/memremap.c | 22 ++++++++++++---------- > > 2 files changed, 12 insertions(+), 16 deletions(-) > > > > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c > > index f9f76f6ba07b..21db1ce8c0ae 100644 > > --- a/drivers/nvdimm/pmem.c > > +++ b/drivers/nvdimm/pmem.c > > @@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem) > > put_disk(pmem->disk); > > } > > > > -static void pmem_pagemap_page_free(struct page *page) > > -{ > > - wake_up_var(&page->_refcount); > > -} > > - > > static const struct dev_pagemap_ops fsdax_pagemap_ops = { > > - .page_free = pmem_pagemap_page_free, > > .kill = pmem_pagemap_kill, > > .cleanup = pmem_pagemap_cleanup, > > }; > > diff --git a/mm/memremap.c b/mm/memremap.c > > index 022e78e68ea0..6e6f3d6fdb73 100644 > > --- a/mm/memremap.c > > +++ b/mm/memremap.c > > @@ -27,7 +27,8 @@ static void devmap_managed_enable_put(void) > > > > static int devmap_managed_enable_get(struct dev_pagemap *pgmap) > > { > > - if (!pgmap->ops || !pgmap->ops->page_free) { > > + if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE > > + && !pgmap->ops->page_free)) { > > > OK, so only MEMORY_DEVICE_PRIVATE has .page_free ops. That looks > correct to me, based on looking at the .page_free setters--I > only see Nouveau setting it. Correct. The FSDAX case still needs to enable the 'devmap_managed_key' static key, but other than that the core will handle all the follow-on details.
On Wed, Nov 13, 2019 at 04:07:22PM -0800, Dan Williams wrote: > After the removal of the device-public infrastructure there are only 2 > ->page_free() call backs in the kernel. One of those is a device-private > callback in the nouveau driver, the other is a generic wakeup needed in > the DAX case. In the hopes that all ->page_free() callbacks can be > migrated to common core kernel functionality, move the device-private > specific actions in __put_devmap_managed_page() under the > is_device_private_page() conditional, including the ->page_free() > callback. For the other page types just open-code the generic wakeup. > > Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it > does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA > case. > > Cc: Jan Kara <jack@suse.cz> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Ira Weiny <ira.weiny@intel.com> > Cc: Jérôme Glisse <jglisse@redhat.com> > Cc: John Hubbard <jhubbard@nvidia.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> All looks good to me. Reviewed-by: Jérôme Glisse <jglisse@redhat.com> > --- > Hi John, > > This applies on top of today's linux-next and passes my nvdimm unit > tests. That testing noticed that devmap_managed_enable_get() needed a > small fixup as well. > > drivers/nvdimm/pmem.c | 6 ------ > mm/memremap.c | 22 ++++++++++++---------- > 2 files changed, 12 insertions(+), 16 deletions(-) > > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c > index f9f76f6ba07b..21db1ce8c0ae 100644 > --- a/drivers/nvdimm/pmem.c > +++ b/drivers/nvdimm/pmem.c > @@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem) > put_disk(pmem->disk); > } > > -static void pmem_pagemap_page_free(struct page *page) > -{ > - wake_up_var(&page->_refcount); > -} > - > static const struct dev_pagemap_ops fsdax_pagemap_ops = { > - .page_free = pmem_pagemap_page_free, > .kill = pmem_pagemap_kill, > .cleanup = pmem_pagemap_cleanup, > }; > diff --git a/mm/memremap.c b/mm/memremap.c > index 022e78e68ea0..6e6f3d6fdb73 100644 > --- a/mm/memremap.c > +++ b/mm/memremap.c > @@ -27,7 +27,8 @@ static void devmap_managed_enable_put(void) > > static int devmap_managed_enable_get(struct dev_pagemap *pgmap) > { > - if (!pgmap->ops || !pgmap->ops->page_free) { > + if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE > + && !pgmap->ops->page_free)) { > WARN(1, "Missing page_free method\n"); > return -EINVAL; > } > @@ -449,12 +450,6 @@ void __put_devmap_managed_page(struct page *page) > * holds a reference on the page. > */ > if (count == 1) { > - /* Clear Active bit in case of parallel mark_page_accessed */ > - __ClearPageActive(page); > - __ClearPageWaiters(page); > - > - mem_cgroup_uncharge(page); > - > /* > * When a device_private page is freed, the page->mapping field > * may still contain a (stale) mapping value. For example, the > @@ -476,10 +471,17 @@ void __put_devmap_managed_page(struct page *page) > * handled differently or not done at all, so there is no need > * to clear page->mapping. > */ > - if (is_device_private_page(page)) > - page->mapping = NULL; > + if (is_device_private_page(page)) { > + /* Clear Active bit in case of parallel mark_page_accessed */ > + __ClearPageActive(page); > + __ClearPageWaiters(page); > > - page->pgmap->ops->page_free(page); > + mem_cgroup_uncharge(page); > + > + page->mapping = NULL; > + page->pgmap->ops->page_free(page); > + } else > + wake_up_var(&page->_refcount); > } else if (!count) > __put_page(page); > } >
On Wed, Nov 13, 2019 at 04:07:22PM -0800, Dan Williams wrote: > static int devmap_managed_enable_get(struct dev_pagemap *pgmap) > { > - if (!pgmap->ops || !pgmap->ops->page_free) { > + if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE > + && !pgmap->ops->page_free)) { I don't think this check is correct. You only want the the ops null check or MEMORY_DEVICE_PRIVATE as well now, i.e.: if (pgmap->type == MEMORY_DEVICE_PRIVATE && (!pgmap->ops || !pgmap->ops->page_free)) { > @@ -476,10 +471,17 @@ void __put_devmap_managed_page(struct page *page) > * handled differently or not done at all, so there is no need > * to clear page->mapping. > */ > - if (is_device_private_page(page)) > - page->mapping = NULL; > + if (is_device_private_page(page)) { > + /* Clear Active bit in case of parallel mark_page_accessed */ This adds a > 80 char line. But that whole flow of the function seems rather odd now. Why can't we do: if (count == 0) { __put_page(page); } else if (is_device_private_page(page)) { __ClearPageActive(page); __ClearPageWaiters(page); mem_cgroup_uncharge(page); page->mapping = NULL; page->pgmap->ops->page_free(page); } else { wake_up_var(&page->_refcount); } (except for the fact that I don't get the point of calling __put_page on a refcount of zero, but that is separate from this patch).
On Wed, Nov 13, 2019 at 04:47:38PM -0800, Dan Williams wrote: > > Got it. This will appear in the next posted version of my "mm/gup: track > > dma-pinned pages: FOLL_PIN, FOLL_LONGTERM" patchset. > > Thanks! John - can you please send a small series just doing the zone device patches rework? That way we can review it separately and maybe even get it into 5.5.
On Wed, Nov 13, 2019 at 11:19 PM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, Nov 13, 2019 at 04:07:22PM -0800, Dan Williams wrote: > > static int devmap_managed_enable_get(struct dev_pagemap *pgmap) > > { > > - if (!pgmap->ops || !pgmap->ops->page_free) { > > + if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE > > + && !pgmap->ops->page_free)) { > > I don't think this check is correct. You only want the the ops null check > or MEMORY_DEVICE_PRIVATE as well now, i.e.: > > if (pgmap->type == MEMORY_DEVICE_PRIVATE && > (!pgmap->ops || !pgmap->ops->page_free)) { > > > @@ -476,10 +471,17 @@ void __put_devmap_managed_page(struct page *page) > > * handled differently or not done at all, so there is no need > > * to clear page->mapping. > > */ > > - if (is_device_private_page(page)) > > - page->mapping = NULL; > > + if (is_device_private_page(page)) { > > + /* Clear Active bit in case of parallel mark_page_accessed */ > > This adds a > 80 char line. But that whole flow of the function seems > rather odd now. > > Why can't we do: > > if (count == 0) { > __put_page(page); > } else if (is_device_private_page(page)) { > __ClearPageActive(page); > __ClearPageWaiters(page); > > mem_cgroup_uncharge(page); > page->mapping = NULL; > page->pgmap->ops->page_free(page); > } else { > wake_up_var(&page->_refcount); > } > All the above looks good to me will spin a v2. > (except for the fact that I don't get the point of calling __put_page > on a refcount of zero, but that is separate from this patch). That looked odd to me as well until I recalled that we did that to simplify the pgmap reference counting. 71389703839e mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash I'll add a comment in v2.
On 11/13/19 11:23 PM, Christoph Hellwig wrote: > On Wed, Nov 13, 2019 at 04:47:38PM -0800, Dan Williams wrote: >>> Got it. This will appear in the next posted version of my "mm/gup: track >>> dma-pinned pages: FOLL_PIN, FOLL_LONGTERM" patchset. >> >> Thanks! > > John - can you please send a small series just doing the zone device > patches rework? That way we can review it separately and maybe even get > it into 5.5. > Sure. thanks,
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index f9f76f6ba07b..21db1ce8c0ae 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem) put_disk(pmem->disk); } -static void pmem_pagemap_page_free(struct page *page) -{ - wake_up_var(&page->_refcount); -} - static const struct dev_pagemap_ops fsdax_pagemap_ops = { - .page_free = pmem_pagemap_page_free, .kill = pmem_pagemap_kill, .cleanup = pmem_pagemap_cleanup, }; diff --git a/mm/memremap.c b/mm/memremap.c index 022e78e68ea0..6e6f3d6fdb73 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -27,7 +27,8 @@ static void devmap_managed_enable_put(void) static int devmap_managed_enable_get(struct dev_pagemap *pgmap) { - if (!pgmap->ops || !pgmap->ops->page_free) { + if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE + && !pgmap->ops->page_free)) { WARN(1, "Missing page_free method\n"); return -EINVAL; } @@ -449,12 +450,6 @@ void __put_devmap_managed_page(struct page *page) * holds a reference on the page. */ if (count == 1) { - /* Clear Active bit in case of parallel mark_page_accessed */ - __ClearPageActive(page); - __ClearPageWaiters(page); - - mem_cgroup_uncharge(page); - /* * When a device_private page is freed, the page->mapping field * may still contain a (stale) mapping value. For example, the @@ -476,10 +471,17 @@ void __put_devmap_managed_page(struct page *page) * handled differently or not done at all, so there is no need * to clear page->mapping. */ - if (is_device_private_page(page)) - page->mapping = NULL; + if (is_device_private_page(page)) { + /* Clear Active bit in case of parallel mark_page_accessed */ + __ClearPageActive(page); + __ClearPageWaiters(page); - page->pgmap->ops->page_free(page); + mem_cgroup_uncharge(page); + + page->mapping = NULL; + page->pgmap->ops->page_free(page); + } else + wake_up_var(&page->_refcount); } else if (!count) __put_page(page); }
After the removal of the device-public infrastructure there are only 2 ->page_free() call backs in the kernel. One of those is a device-private callback in the nouveau driver, the other is a generic wakeup needed in the DAX case. In the hopes that all ->page_free() callbacks can be migrated to common core kernel functionality, move the device-private specific actions in __put_devmap_managed_page() under the is_device_private_page() conditional, including the ->page_free() callback. For the other page types just open-code the generic wakeup. Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA case. Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- Hi John, This applies on top of today's linux-next and passes my nvdimm unit tests. That testing noticed that devmap_managed_enable_get() needed a small fixup as well. drivers/nvdimm/pmem.c | 6 ------ mm/memremap.c | 22 ++++++++++++---------- 2 files changed, 12 insertions(+), 16 deletions(-)