diff mbox

[2/2] mm: fix dev_pagemap reference counting around get_dev_pagemap

Message ID 20171205003443.22111-3-hch@lst.de (mailing list archive)
State New, archived
Headers show

Commit Message

Christoph Hellwig Dec. 5, 2017, 12:34 a.m. UTC
Both callers of get_dev_pagemap that pass in a pgmap don't actually hold a
reference to the pgmap they pass in, contrary to the comment in the function.

Change the calling convention so that get_dev_pagemap always consumes the
previous reference instead of doing this using an explicit earlier call to
put_dev_pagemap in the callers.

The callers will still need to put the final reference after finishing the
loop over the pages.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/memremap.c | 17 +++++++++--------
 mm/gup.c          |  7 +++++--
 2 files changed, 14 insertions(+), 10 deletions(-)

Comments

Dan Williams Dec. 6, 2017, 2:43 a.m. UTC | #1
On Mon, Dec 4, 2017 at 4:34 PM, Christoph Hellwig <hch@lst.de> wrote:
> Both callers of get_dev_pagemap that pass in a pgmap don't actually hold a
> reference to the pgmap they pass in, contrary to the comment in the function.
>
> Change the calling convention so that get_dev_pagemap always consumes the
> previous reference instead of doing this using an explicit earlier call to
> put_dev_pagemap in the callers.
>
> The callers will still need to put the final reference after finishing the
> loop over the pages.

I don't think we need this change, but perhaps the reasoning should be
added to the code as a comment... details below.

>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  kernel/memremap.c | 17 +++++++++--------
>  mm/gup.c          |  7 +++++--
>  2 files changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index f0b54eca85b0..502fa107a585 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -506,22 +506,23 @@ struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
>   * @pfn: page frame number to lookup page_map
>   * @pgmap: optional known pgmap that already has a reference
>   *
> - * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
> - * same mapping.
> + * If @pgmap is non-NULL and covers @pfn it will be returned as-is.  If @pgmap
> + * is non-NULL but does not cover @pfn the reference to it while be released.
>   */
>  struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
>                 struct dev_pagemap *pgmap)
>  {
> -       const struct resource *res = pgmap ? pgmap->res : NULL;
>         resource_size_t phys = PFN_PHYS(pfn);
>
>         /*
> -        * In the cached case we're already holding a live reference so
> -        * we can simply do a blind increment
> +        * In the cached case we're already holding a live reference.
>          */
> -       if (res && phys >= res->start && phys <= res->end) {
> -               percpu_ref_get(pgmap->ref);
> -               return pgmap;
> +       if (pgmap) {
> +               const struct resource *res = pgmap ? pgmap->res : NULL;
> +
> +               if (res && phys >= res->start && phys <= res->end)
> +                       return pgmap;
> +               put_dev_pagemap(pgmap);
>         }
>
>         /* fall back to slow path lookup */
> diff --git a/mm/gup.c b/mm/gup.c
> index d3fb60e5bfac..9d142eb9e2e9 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1410,7 +1410,6 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>
>                 VM_BUG_ON_PAGE(compound_head(page) != head, page);
>
> -               put_dev_pagemap(pgmap);
>                 SetPageReferenced(page);
>                 pages[*nr] = page;
>                 (*nr)++;
> @@ -1420,6 +1419,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>         ret = 1;
>
>  pte_unmap:
> +       if (pgmap)
> +               put_dev_pagemap(pgmap);
>         pte_unmap(ptem);
>         return ret;
>  }
> @@ -1459,10 +1460,12 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
>                 SetPageReferenced(page);
>                 pages[*nr] = page;
>                 get_page(page);
> -               put_dev_pagemap(pgmap);

It's safe to do the put_dev_pagemap() here because the pgmap cannot be
released until the corresponding put_page() for that get_page() we
just did occurs. So we're only holding the pgmap reference long enough
to take individual page references.

We used to take and put individual pgmap references inside get_page()
/ put_page(), but that got simplified in this commit to just take and
put page reference at devm_memremap_pages() setup / teardown time:

71389703839e mm, zone_device: Replace {get, put}_zone_device_page()
with a single reference to fix pmem crash
Christoph Hellwig Dec. 6, 2017, 10:44 p.m. UTC | #2
On Tue, Dec 05, 2017 at 06:43:36PM -0800, Dan Williams wrote:
> I don't think we need this change, but perhaps the reasoning should be
> added to the code as a comment... details below.

Hmm, looks like we are ok at least.  But even if it's not a correctness
issue there is no good point in decrementing and incrementing the
reference count every time.
Dan Williams Dec. 6, 2017, 10:52 p.m. UTC | #3
On Wed, Dec 6, 2017 at 2:44 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Tue, Dec 05, 2017 at 06:43:36PM -0800, Dan Williams wrote:
>> I don't think we need this change, but perhaps the reasoning should be
>> added to the code as a comment... details below.
>
> Hmm, looks like we are ok at least.  But even if it's not a correctness
> issue there is no good point in decrementing and incrementing the
> reference count every time.

True, we can take it once and drop it at the end when all the related
page references have been taken.
diff mbox

Patch

diff --git a/kernel/memremap.c b/kernel/memremap.c
index f0b54eca85b0..502fa107a585 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -506,22 +506,23 @@  struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
  * @pfn: page frame number to lookup page_map
  * @pgmap: optional known pgmap that already has a reference
  *
- * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
- * same mapping.
+ * If @pgmap is non-NULL and covers @pfn it will be returned as-is.  If @pgmap
+ * is non-NULL but does not cover @pfn the reference to it while be released.
  */
 struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
 		struct dev_pagemap *pgmap)
 {
-	const struct resource *res = pgmap ? pgmap->res : NULL;
 	resource_size_t phys = PFN_PHYS(pfn);
 
 	/*
-	 * In the cached case we're already holding a live reference so
-	 * we can simply do a blind increment
+	 * In the cached case we're already holding a live reference.
 	 */
-	if (res && phys >= res->start && phys <= res->end) {
-		percpu_ref_get(pgmap->ref);
-		return pgmap;
+	if (pgmap) {
+		const struct resource *res = pgmap ? pgmap->res : NULL;
+
+		if (res && phys >= res->start && phys <= res->end)
+			return pgmap;
+		put_dev_pagemap(pgmap);
 	}
 
 	/* fall back to slow path lookup */
diff --git a/mm/gup.c b/mm/gup.c
index d3fb60e5bfac..9d142eb9e2e9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1410,7 +1410,6 @@  static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 
 		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 
-		put_dev_pagemap(pgmap);
 		SetPageReferenced(page);
 		pages[*nr] = page;
 		(*nr)++;
@@ -1420,6 +1419,8 @@  static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 	ret = 1;
 
 pte_unmap:
+	if (pgmap)
+		put_dev_pagemap(pgmap);
 	pte_unmap(ptem);
 	return ret;
 }
@@ -1459,10 +1460,12 @@  static int __gup_device_huge(unsigned long pfn, unsigned long addr,
 		SetPageReferenced(page);
 		pages[*nr] = page;
 		get_page(page);
-		put_dev_pagemap(pgmap);
 		(*nr)++;
 		pfn++;
 	} while (addr += PAGE_SIZE, addr != end);
+
+	if (pgmap)
+		put_dev_pagemap(pgmap);
 	return 1;
 }