Message ID | 20171205003443.22111-3-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Dec 4, 2017 at 4:34 PM, Christoph Hellwig <hch@lst.de> wrote: > Both callers of get_dev_pagemap that pass in a pgmap don't actually hold a > reference to the pgmap they pass in, contrary to the comment in the function. > > Change the calling convention so that get_dev_pagemap always consumes the > previous reference instead of doing this using an explicit earlier call to > put_dev_pagemap in the callers. > > The callers will still need to put the final reference after finishing the > loop over the pages. I don't think we need this change, but perhaps the reasoning should be added to the code as a comment... details below. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > kernel/memremap.c | 17 +++++++++-------- > mm/gup.c | 7 +++++-- > 2 files changed, 14 insertions(+), 10 deletions(-) > > diff --git a/kernel/memremap.c b/kernel/memremap.c > index f0b54eca85b0..502fa107a585 100644 > --- a/kernel/memremap.c > +++ b/kernel/memremap.c > @@ -506,22 +506,23 @@ struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start) > * @pfn: page frame number to lookup page_map > * @pgmap: optional known pgmap that already has a reference > * > - * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the > - * same mapping. > + * If @pgmap is non-NULL and covers @pfn it will be returned as-is. If @pgmap > + * is non-NULL but does not cover @pfn the reference to it while be released. > */ > struct dev_pagemap *get_dev_pagemap(unsigned long pfn, > struct dev_pagemap *pgmap) > { > - const struct resource *res = pgmap ? pgmap->res : NULL; > resource_size_t phys = PFN_PHYS(pfn); > > /* > - * In the cached case we're already holding a live reference so > - * we can simply do a blind increment > + * In the cached case we're already holding a live reference. > */ > - if (res && phys >= res->start && phys <= res->end) { > - percpu_ref_get(pgmap->ref); > - return pgmap; > + if (pgmap) { > + const struct resource *res = pgmap ? pgmap->res : NULL; > + > + if (res && phys >= res->start && phys <= res->end) > + return pgmap; > + put_dev_pagemap(pgmap); > } > > /* fall back to slow path lookup */ > diff --git a/mm/gup.c b/mm/gup.c > index d3fb60e5bfac..9d142eb9e2e9 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1410,7 +1410,6 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, > > VM_BUG_ON_PAGE(compound_head(page) != head, page); > > - put_dev_pagemap(pgmap); > SetPageReferenced(page); > pages[*nr] = page; > (*nr)++; > @@ -1420,6 +1419,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, > ret = 1; > > pte_unmap: > + if (pgmap) > + put_dev_pagemap(pgmap); > pte_unmap(ptem); > return ret; > } > @@ -1459,10 +1460,12 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, > SetPageReferenced(page); > pages[*nr] = page; > get_page(page); > - put_dev_pagemap(pgmap); It's safe to do the put_dev_pagemap() here because the pgmap cannot be released until the corresponding put_page() for that get_page() we just did occurs. So we're only holding the pgmap reference long enough to take individual page references. We used to take and put individual pgmap references inside get_page() / put_page(), but that got simplified in this commit to just take and put page reference at devm_memremap_pages() setup / teardown time: 71389703839e mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash
On Tue, Dec 05, 2017 at 06:43:36PM -0800, Dan Williams wrote: > I don't think we need this change, but perhaps the reasoning should be > added to the code as a comment... details below. Hmm, looks like we are ok at least. But even if it's not a correctness issue there is no good point in decrementing and incrementing the reference count every time.
On Wed, Dec 6, 2017 at 2:44 PM, Christoph Hellwig <hch@lst.de> wrote: > On Tue, Dec 05, 2017 at 06:43:36PM -0800, Dan Williams wrote: >> I don't think we need this change, but perhaps the reasoning should be >> added to the code as a comment... details below. > > Hmm, looks like we are ok at least. But even if it's not a correctness > issue there is no good point in decrementing and incrementing the > reference count every time. True, we can take it once and drop it at the end when all the related page references have been taken.
diff --git a/kernel/memremap.c b/kernel/memremap.c index f0b54eca85b0..502fa107a585 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -506,22 +506,23 @@ struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start) * @pfn: page frame number to lookup page_map * @pgmap: optional known pgmap that already has a reference * - * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the - * same mapping. + * If @pgmap is non-NULL and covers @pfn it will be returned as-is. If @pgmap + * is non-NULL but does not cover @pfn the reference to it while be released. */ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, struct dev_pagemap *pgmap) { - const struct resource *res = pgmap ? pgmap->res : NULL; resource_size_t phys = PFN_PHYS(pfn); /* - * In the cached case we're already holding a live reference so - * we can simply do a blind increment + * In the cached case we're already holding a live reference. */ - if (res && phys >= res->start && phys <= res->end) { - percpu_ref_get(pgmap->ref); - return pgmap; + if (pgmap) { + const struct resource *res = pgmap ? pgmap->res : NULL; + + if (res && phys >= res->start && phys <= res->end) + return pgmap; + put_dev_pagemap(pgmap); } /* fall back to slow path lookup */ diff --git a/mm/gup.c b/mm/gup.c index d3fb60e5bfac..9d142eb9e2e9 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1410,7 +1410,6 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON_PAGE(compound_head(page) != head, page); - put_dev_pagemap(pgmap); SetPageReferenced(page); pages[*nr] = page; (*nr)++; @@ -1420,6 +1419,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, ret = 1; pte_unmap: + if (pgmap) + put_dev_pagemap(pgmap); pte_unmap(ptem); return ret; } @@ -1459,10 +1460,12 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, SetPageReferenced(page); pages[*nr] = page; get_page(page); - put_dev_pagemap(pgmap); (*nr)++; pfn++; } while (addr += PAGE_SIZE, addr != end); + + if (pgmap) + put_dev_pagemap(pgmap); return 1; }
Both callers of get_dev_pagemap that pass in a pgmap don't actually hold a reference to the pgmap they pass in, contrary to the comment in the function. Change the calling convention so that get_dev_pagemap always consumes the previous reference instead of doing this using an explicit earlier call to put_dev_pagemap in the callers. The callers will still need to put the final reference after finishing the loop over the pages. Signed-off-by: Christoph Hellwig <hch@lst.de> --- kernel/memremap.c | 17 +++++++++-------- mm/gup.c | 7 +++++-- 2 files changed, 14 insertions(+), 10 deletions(-)