Message ID | 20201208172901.17384-9-joao.m.martins@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm, sparse-vmemmap: Introduce compound pagemaps | expand |
On Tue, Dec 08, 2020 at 05:28:59PM +0000, Joao Martins wrote: > Rather than decrementing the ref count one by one, we > walk the page array and checking which belong to the same > compound_head. Later on we decrement the calculated amount > of references in a single write to the head page. > > Signed-off-by: Joao Martins <joao.m.martins@oracle.com> > mm/gup.c | 41 ++++++++++++++++++++++++++++++++--------- > 1 file changed, 32 insertions(+), 9 deletions(-) > > diff --git a/mm/gup.c b/mm/gup.c > index 194e6981eb03..3a9a7229f418 100644 > +++ b/mm/gup.c > @@ -212,6 +212,18 @@ static bool __unpin_devmap_managed_user_page(struct page *page) > } > #endif /* CONFIG_DEV_PAGEMAP_OPS */ > > +static int record_refs(struct page **pages, int npages) > +{ > + struct page *head = compound_head(pages[0]); > + int refs = 1, index; > + > + for (index = 1; index < npages; index++, refs++) > + if (compound_head(pages[index]) != head) > + break; > + > + return refs; > +} > + > /** > * unpin_user_page() - release a dma-pinned page > * @page: pointer to page to be released > @@ -221,9 +233,9 @@ static bool __unpin_devmap_managed_user_page(struct page *page) > * that such pages can be separately tracked and uniquely handled. In > * particular, interactions with RDMA and filesystems need special handling. > */ > -void unpin_user_page(struct page *page) > +static void __unpin_user_page(struct page *page, int refs) Refs should be unsigned everywhere. I suggest using clear language 'page' here should always be a compound head called 'head' (or do we have another common variable name for this?) 'refs' is number of tail pages within the compound, so 'ntails' or something > { > - int refs = 1; > + int orig_refs = refs; > > page = compound_head(page); Caller should always do this > @@ -237,14 +249,19 @@ void unpin_user_page(struct page *page) > return; > > if (hpage_pincount_available(page)) > - hpage_pincount_sub(page, 1); > + hpage_pincount_sub(page, refs); > else > - refs = GUP_PIN_COUNTING_BIAS; > + refs *= GUP_PIN_COUNTING_BIAS; > > if (page_ref_sub_and_test(page, refs)) > __put_page(page); > > - mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, 1); > + mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, orig_refs); > +} And really this should be placed directly after try_grab_compound_head() and be given a similar name 'unpin_compound_head()'. Even better would be to split the FOLL_PIN part into a function so there was a clear logical pairing. And reviewing it like that I want to ask if this unpin sequence is in the right order.. I would expect it to be the reverse order of the get John? Is it safe to call mod_node_page_state() after releasing the refcount? This could race with hot-unplugging the struct pages so I think it is wrong. > +void unpin_user_page(struct page *page) > +{ > + __unpin_user_page(page, 1); Thus this is __unpin_user_page(compound_head(page), 1); > @@ -274,6 +291,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, > bool make_dirty) > { > unsigned long index; > + int refs = 1; > > /* > * TODO: this can be optimized for huge pages: if a series of pages is > @@ -286,8 +304,9 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, > return; > } > > - for (index = 0; index < npages; index++) { > + for (index = 0; index < npages; index += refs) { > struct page *page = compound_head(pages[index]); > + I think this is really hard to read, it should end up as some: for_each_compond_head(page_list, page_list_len, &head, &ntails) { if (!PageDirty(head)) set_page_dirty_lock(head, ntails); unpin_user_page(head, ntails); } And maybe you open code that iteration, but that basic idea to find a compound_head and ntails should be computational work performed. No reason not to fix set_page_dirty_lock() too while you are here. Also, this patch and the next can be completely independent of the rest of the series, it is valuable regardless of the other tricks. You can split them and progress them independently. .. and I was just talking about this with Daniel Jordan and some other people at your company :) Thanks, Jason
On 12/8/20 11:34 AM, Jason Gunthorpe wrote: > On Tue, Dec 08, 2020 at 05:28:59PM +0000, Joao Martins wrote: >> Rather than decrementing the ref count one by one, we >> walk the page array and checking which belong to the same >> compound_head. Later on we decrement the calculated amount >> of references in a single write to the head page. >> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >> mm/gup.c | 41 ++++++++++++++++++++++++++++++++--------- >> 1 file changed, 32 insertions(+), 9 deletions(-) >> >> diff --git a/mm/gup.c b/mm/gup.c >> index 194e6981eb03..3a9a7229f418 100644 >> +++ b/mm/gup.c >> @@ -212,6 +212,18 @@ static bool __unpin_devmap_managed_user_page(struct page *page) >> } >> #endif /* CONFIG_DEV_PAGEMAP_OPS */ >> >> +static int record_refs(struct page **pages, int npages) >> +{ >> + struct page *head = compound_head(pages[0]); >> + int refs = 1, index; >> + >> + for (index = 1; index < npages; index++, refs++) >> + if (compound_head(pages[index]) != head) >> + break; >> + >> + return refs; >> +} >> + >> /** >> * unpin_user_page() - release a dma-pinned page >> * @page: pointer to page to be released >> @@ -221,9 +233,9 @@ static bool __unpin_devmap_managed_user_page(struct page *page) >> * that such pages can be separately tracked and uniquely handled. In >> * particular, interactions with RDMA and filesystems need special handling. >> */ >> -void unpin_user_page(struct page *page) >> +static void __unpin_user_page(struct page *page, int refs) > > Refs should be unsigned everywhere. That's fine (although, see my comments in the previous patch for pitfalls). But it should be a preparatory patch, in order to avoid clouding up this one and your others as well. > > I suggest using clear language 'page' here should always be a compound > head called 'head' (or do we have another common variable name for > this?) > Agreed. Matthew's struct folio upgrade will allow us to really make things clear in a typesafe way, but meanwhile, it's probably good to use one of the following patterns: page = compound_head(page); // at the very beginning of a routine or do_things_to_this_single_page(page); head = compound_head(page); do_things_to_this_compound_page(head); > 'refs' is number of tail pages within the compound, so 'ntails' or > something > I think it's OK to leave it as "refs", because within gup.c, refs has a very particular meaning. But if you change to ntails or something, I'd want to see a complete change: no leftovers of refs that are really ntails. So far I'd rather leave it as refs, but it's not a big deal either way. >> { >> - int refs = 1; >> + int orig_refs = refs; >> >> page = compound_head(page); > > Caller should always do this > >> @@ -237,14 +249,19 @@ void unpin_user_page(struct page *page) >> return; >> >> if (hpage_pincount_available(page)) >> - hpage_pincount_sub(page, 1); >> + hpage_pincount_sub(page, refs); Maybe a nice touch would be to pass in orig_refs, because there is no intention to use a possibly modified refs. So: hpage_pincount_sub(page, orig_refs); ...obviously a fine point, I realize. :) >> else >> - refs = GUP_PIN_COUNTING_BIAS; >> + refs *= GUP_PIN_COUNTING_BIAS; >> >> if (page_ref_sub_and_test(page, refs)) >> __put_page(page); >> >> - mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, 1); >> + mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, orig_refs); >> +} > > And really this should be placed directly after > try_grab_compound_head() and be given a similar name > 'unpin_compound_head()'. Even better would be to split the FOLL_PIN > part into a function so there was a clear logical pairing. > > And reviewing it like that I want to ask if this unpin sequence is in > the right order.. I would expect it to be the reverse order of the get > > John? > > Is it safe to call mod_node_page_state() after releasing the refcount? > This could race with hot-unplugging the struct pages so I think it is > wrong. Yes, I think you are right! I wasn't in a hot unplug state of mind when I thought about the ordering there, but I should have been. :) > >> +void unpin_user_page(struct page *page) >> +{ >> + __unpin_user_page(page, 1); > > Thus this is > > __unpin_user_page(compound_head(page), 1); > >> @@ -274,6 +291,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, >> bool make_dirty) >> { >> unsigned long index; >> + int refs = 1; >> >> /* >> * TODO: this can be optimized for huge pages: if a series of pages is I think you can delete this TODO block now, and the one in unpin_user_pages_dirty_lock(), as a result of these changes. >> @@ -286,8 +304,9 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, >> return; >> } >> >> - for (index = 0; index < npages; index++) { >> + for (index = 0; index < npages; index += refs) { >> struct page *page = compound_head(pages[index]); >> + > > I think this is really hard to read, it should end up as some: > > for_each_compond_head(page_list, page_list_len, &head, &ntails) { > if (!PageDirty(head)) > set_page_dirty_lock(head, ntails); > unpin_user_page(head, ntails); > } > > And maybe you open code that iteration, but that basic idea to find a > compound_head and ntails should be computational work performed. > > No reason not to fix set_page_dirty_lock() too while you are here. Eh? What's wrong with set_page_dirty_lock() ? > > Also, this patch and the next can be completely independent of the > rest of the series, it is valuable regardless of the other tricks. You > can split them and progress them independently. > > .. and I was just talking about this with Daniel Jordan and some other > people at your company :) > > Thanks, > Jason > thanks,
On 12/8/20 7:34 PM, Jason Gunthorpe wrote: > On Tue, Dec 08, 2020 at 05:28:59PM +0000, Joao Martins wrote: >> Rather than decrementing the ref count one by one, we >> walk the page array and checking which belong to the same >> compound_head. Later on we decrement the calculated amount >> of references in a single write to the head page. >> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >> mm/gup.c | 41 ++++++++++++++++++++++++++++++++--------- >> 1 file changed, 32 insertions(+), 9 deletions(-) >> >> diff --git a/mm/gup.c b/mm/gup.c >> index 194e6981eb03..3a9a7229f418 100644 >> +++ b/mm/gup.c >> @@ -212,6 +212,18 @@ static bool __unpin_devmap_managed_user_page(struct page *page) >> } >> #endif /* CONFIG_DEV_PAGEMAP_OPS */ >> >> +static int record_refs(struct page **pages, int npages) >> +{ >> + struct page *head = compound_head(pages[0]); >> + int refs = 1, index; >> + >> + for (index = 1; index < npages; index++, refs++) >> + if (compound_head(pages[index]) != head) >> + break; >> + >> + return refs; >> +} >> + >> /** >> * unpin_user_page() - release a dma-pinned page >> * @page: pointer to page to be released >> @@ -221,9 +233,9 @@ static bool __unpin_devmap_managed_user_page(struct page *page) >> * that such pages can be separately tracked and uniquely handled. In >> * particular, interactions with RDMA and filesystems need special handling. >> */ >> -void unpin_user_page(struct page *page) >> +static void __unpin_user_page(struct page *page, int refs) > > Refs should be unsigned everywhere. > /me nods > I suggest using clear language 'page' here should always be a compound > head called 'head' (or do we have another common variable name for > this?) > > 'refs' is number of tail pages within the compound, so 'ntails' or > something > The usage of 'refs' seems to align with the rest of the GUP code. It's always referring to tail pages and unpin case isn't any different IIUC. I suppose we can always change that, but maybe better do that renaming in one shot as a post cleanup? >> { >> - int refs = 1; >> + int orig_refs = refs; >> >> page = compound_head(page); > > Caller should always do this > /me nods >> @@ -237,14 +249,19 @@ void unpin_user_page(struct page *page) >> return; >> >> if (hpage_pincount_available(page)) >> - hpage_pincount_sub(page, 1); >> + hpage_pincount_sub(page, refs); >> else >> - refs = GUP_PIN_COUNTING_BIAS; >> + refs *= GUP_PIN_COUNTING_BIAS; >> >> if (page_ref_sub_and_test(page, refs)) >> __put_page(page); >> >> - mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, 1); >> + mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, orig_refs); >> +} > > And really this should be placed directly after > try_grab_compound_head() and be given a similar name > 'unpin_compound_head()'. Even better would be to split the FOLL_PIN > part into a function so there was a clear logical pairing. > > And reviewing it like that I want to ask if this unpin sequence is in > the right order.. I would expect it to be the reverse order of the get > > John? > > Is it safe to call mod_node_page_state() after releasing the refcount? > This could race with hot-unplugging the struct pages so I think it is > wrong. > It appears to be case based on John's follow up comment. >> +void unpin_user_page(struct page *page) >> +{ >> + __unpin_user_page(page, 1); > > Thus this is > > __unpin_user_page(compound_head(page), 1); > Got it. >> @@ -274,6 +291,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, >> bool make_dirty) >> { >> unsigned long index; >> + int refs = 1; >> >> /* >> * TODO: this can be optimized for huge pages: if a series of pages is >> @@ -286,8 +304,9 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, >> return; >> } >> >> - for (index = 0; index < npages; index++) { >> + for (index = 0; index < npages; index += refs) { >> struct page *page = compound_head(pages[index]); >> + > > I think this is really hard to read, it should end up as some: > > for_each_compond_head(page_list, page_list_len, &head, &ntails) { > if (!PageDirty(head)) > set_page_dirty_lock(head, ntails); > unpin_user_page(head, ntails); > } > /me nods Let me attempt at that. > And maybe you open code that iteration, but that basic idea to find a > compound_head and ntails should be computational work performed. > I like the idea of a page range API alternative to unpin_user_pages(), but improving current unpin_user_pages() would improve other unpin users too. Perhaps the logic can be common, and the current unpin_user_pages() would have the second iteration part, while the new (faster) API be based on computation. > No reason not to fix set_page_dirty_lock() too while you are here. > OK. > Also, this patch and the next can be completely independent of the > rest of the series, it is valuable regardless of the other tricks. You > can split them and progress them independently. > Yeap, let me do that. > .. and I was just talking about this with Daniel Jordan and some other > people at your company :) > :)
On Tue, Dec 08, 2020 at 09:06:50PM -0800, John Hubbard wrote: > > I suggest using clear language 'page' here should always be a compound > > head called 'head' (or do we have another common variable name for > > this?) > > Agreed. Matthew's struct folio upgrade will allow us to really make > things clear in a typesafe way, but meanwhile, it's probably good to use > one of the following patterns: Yes, this fits very well with the folio patches, and is much clearer > page = compound_head(page); // at the very beginning of a routine No, these routines really want to operate on head/folio's, that is the whole point. > do_things_to_this_single_page(page); > > head = compound_head(page); > do_things_to_this_compound_page(head); Yes, but wordy though > > Is it safe to call mod_node_page_state() after releasing the refcount? > > This could race with hot-unplugging the struct pages so I think it is > > wrong. > > Yes, I think you are right! I wasn't in a hot unplug state of mind when I > thought about the ordering there, but I should have been. :) Ok Hmm.. unpin_user_page() and put_compound_head() do exactly the same thing, and the latter gets it all right. I'll make a patch to fix this > > And maybe you open code that iteration, but that basic idea to find a > > compound_head and ntails should be computational work performed. > > > > No reason not to fix set_page_dirty_lock() too while you are here. > > Eh? What's wrong with set_page_dirty_lock() ? Look at the code: for (index = 0; index < npages; index++) { struct page *page = compound_head(pages[index]); if (!PageDirty(page)) set_page_dirty_lock(page); So we really want set_folio_dirty_lock(folio, ntails) Just like unpin_user_folio(folio, ntails) (wow this is much clearer to explain using Matt's language) set_page_dirty_lock does another wack of atomics so this should be a healthy speedup on the large page benchmark. Jason
On 12/8/20 7:34 PM, Jason Gunthorpe wrote: >> @@ -274,6 +291,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, >> bool make_dirty) >> { >> unsigned long index; >> + int refs = 1; >> >> /* >> * TODO: this can be optimized for huge pages: if a series of pages is >> @@ -286,8 +304,9 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, >> return; >> } >> >> - for (index = 0; index < npages; index++) { >> + for (index = 0; index < npages; index += refs) { >> struct page *page = compound_head(pages[index]); >> + > > I think this is really hard to read, it should end up as some: > > for_each_compond_head(page_list, page_list_len, &head, &ntails) { > if (!PageDirty(head)) > set_page_dirty_lock(head, ntails); > unpin_user_page(head, ntails); > } > > And maybe you open code that iteration, but that basic idea to find a > compound_head and ntails should be computational work performed. > > No reason not to fix set_page_dirty_lock() too while you are here. > The wack of atomics you mentioned earlier you referred to, I suppose it ends being account_page_dirtied(). See partial diff at the end. I was looking at the latter part and renaming all the fs that supply set_page_dirty()... But now my concern is whether it's really safe to assume that filesystems that supply it ... have indeed the ability to dirty @ntails pages. Functionally, fixing set_page_dirty_lock() means we don't call set_page_dirty(head) @ntails times as it happens today, we would only call once with ntails as argument. Perhaps the safest thing to do is still to iterate over @ntails and call .set_page_dirty(page) and instead introduce a set_page_range_dirty() which individual filesystems can separately supply and give precedence of ->set_page_range_dirty() as opposed to ->set_page_dirty() ? Joao --------------------->8------------------------------ diff --git a/mm/gup.c b/mm/gup.c index 41ab3d48e1bb..5f8a0f16ab62 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -295,7 +295,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, * next writeback cycle. This is harmless. */ if (!PageDirty(head)) - set_page_dirty_lock(head); + set_page_range_dirty_lock(head, ntails); put_compound_head(head, ntails, FOLL_PIN); } } diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 088729ea80b2..4642d037f657 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2417,7 +2417,8 @@ int __set_page_dirty_no_writeback(struct page *page, unsigned int ntails) * * NOTE: This relies on being atomic wrt interrupts. */ -void account_page_dirtied(struct page *page, struct address_space *mapping) +void account_page_dirtied(struct page *page, struct address_space *mapping, + unsigned int ntails) { struct inode *inode = mapping->host; @@ -2425,17 +2426,18 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) if (mapping_can_writeback(mapping)) { struct bdi_writeback *wb; + int nr = ntails + 1; inode_attach_wb(inode, page); wb = inode_to_wb(inode); - __inc_lruvec_page_state(page, NR_FILE_DIRTY); - __inc_zone_page_state(page, NR_ZONE_WRITE_PENDING); - __inc_node_page_state(page, NR_DIRTIED); - inc_wb_stat(wb, WB_RECLAIMABLE); - inc_wb_stat(wb, WB_DIRTIED); - task_io_account_write(PAGE_SIZE); - current->nr_dirtied++; + mod_lruvec_page_state(page, NR_FILE_DIRTY, nr); + mod_zone_page_state(page_zone(page), NR_ZONE_WRITE_PENDING, nr); + mod_node_page_state(page_pgdat(page), NR_DIRTIED, nr); + __add_wb_stat(wb, WB_RECLAIMABLE, nr); + __add_wb_stat(wb, WB_DIRTIED, nr); + task_io_account_write(nr * PAGE_SIZE); + current->nr_dirtied += nr; this_cpu_inc(bdp_ratelimits); mem_cgroup_track_foreign_dirty(page, wb); @@ -2485,7 +2487,7 @@ int __set_page_dirty_nobuffers(struct page *page, unsigned int ntails) xa_lock_irqsave(&mapping->i_pages, flags); BUG_ON(page_mapping(page) != mapping); WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); - account_page_dirtied(page, mapping); + account_page_dirtied(page, mapping, ntails); __xa_set_mark(&mapping->i_pages, page_index(page), PAGECACHE_TAG_DIRTY); xa_unlock_irqrestore(&mapping->i_pages, flags); @@ -2624,6 +2626,27 @@ int set_page_dirty_lock(struct page *page) } EXPORT_SYMBOL(set_page_dirty_lock); +/* + * set_page_range_dirty() is racy if the caller has no reference against + * page->mapping->host, and if the page is unlocked. This is because another + * CPU could truncate the page off the mapping and then free the mapping. + * + * Usually, the page _is_ locked, or the caller is a user-space process which + * holds a reference on the inode by having an open file. + * + * In other cases, the page should be locked before running set_page_range_dirty(). + */ +int set_page_range_dirty_lock(struct page *page, unsigned int ntails) +{ + int ret; + + lock_page(page); + ret = set_page_range_dirty(page, ntails); + unlock_page(page); + return ret; +} +EXPORT_SYMBOL(set_page_range_dirty_lock); +
On Thu, Dec 17, 2020 at 07:05:37PM +0000, Joao Martins wrote: > > No reason not to fix set_page_dirty_lock() too while you are here. > > The wack of atomics you mentioned earlier you referred to, I suppose it > ends being account_page_dirtied(). See partial diff at the end. Well, even just eliminating the lock_page, page_mapping, PageDirty, etc is already a big win. If mapping->a_ops->set_page_dirty() needs to be called multiple times on the head page I'd probably just suggest: while (ntails--) rc |= (*spd)(head); At least as a start. If you have workloads that have page_mapping != NULL then look at another series to optimze that. Looks a bit large though given the number of places implementing set_page_dirty I think the current reality is calling set_page_dirty on an actual file system is busted anyhow, so I think mapping is generally going to be NULL here? Jason
On 12/17/20 8:05 PM, Jason Gunthorpe wrote: > On Thu, Dec 17, 2020 at 07:05:37PM +0000, Joao Martins wrote: >>> No reason not to fix set_page_dirty_lock() too while you are here. >> >> The wack of atomics you mentioned earlier you referred to, I suppose it >> ends being account_page_dirtied(). See partial diff at the end. > > Well, even just eliminating the lock_page, page_mapping, PageDirty, > etc is already a big win. > > If mapping->a_ops->set_page_dirty() needs to be called multiple times > on the head page I'd probably just suggest: > > while (ntails--) > rc |= (*spd)(head); > > At least as a start. > /me nods > If you have workloads that have page_mapping != NULL then look at > another series to optimze that. Looks a bit large though given the > number of places implementing set_page_dirty > Yes. I don't have a particular workload, was just wondering what you had in mind, as at a glance, changing all the places without messing filesystems looks like the subject of a separate series. > I think the current reality is calling set_page_dirty on an actual > file system is busted anyhow, so I think mapping is generally going to > be NULL here? Perhaps -- I'll have to check. Joao
On Thu, Dec 17, 2020 at 10:34:54PM +0000, Joao Martins wrote: > On 12/17/20 8:05 PM, Jason Gunthorpe wrote: > > On Thu, Dec 17, 2020 at 07:05:37PM +0000, Joao Martins wrote: > >>> No reason not to fix set_page_dirty_lock() too while you are here. > >> > >> The wack of atomics you mentioned earlier you referred to, I suppose it > >> ends being account_page_dirtied(). See partial diff at the end. > > > > Well, even just eliminating the lock_page, page_mapping, PageDirty, > > etc is already a big win. > > > > If mapping->a_ops->set_page_dirty() needs to be called multiple times > > on the head page I'd probably just suggest: > > > > while (ntails--) > > rc |= (*spd)(head); > > > > At least as a start. > > > /me nods You might just be able to call it once as well, doesn't the page fault handler dirties an entire compound in one call? Someone from the FS world probably knows :) Good example of Matt's Folio concept lending clarity, if this accepted a folio we'd be certain it only need to be called once. Jason
On 12/17/20 12:05 PM, Jason Gunthorpe wrote: > On Thu, Dec 17, 2020 at 07:05:37PM +0000, Joao Martins wrote: >>> No reason not to fix set_page_dirty_lock() too while you are here. >> >> The wack of atomics you mentioned earlier you referred to, I suppose it >> ends being account_page_dirtied(). See partial diff at the end. > > Well, even just eliminating the lock_page, page_mapping, PageDirty, > etc is already a big win. > > If mapping->a_ops->set_page_dirty() needs to be called multiple times > on the head page I'd probably just suggest: > > while (ntails--) > rc |= (*spd)(head); I think once should be enough. There is no counter for page dirtiness, and this kind of accounting is always tracked in the head page, so there is no reason to repeatedly call set_page_dirty() from the same spot. thanks,
On 12/19/20 2:06 AM, John Hubbard wrote: > On 12/17/20 12:05 PM, Jason Gunthorpe wrote: >> On Thu, Dec 17, 2020 at 07:05:37PM +0000, Joao Martins wrote: >>>> No reason not to fix set_page_dirty_lock() too while you are here. >>> >>> The wack of atomics you mentioned earlier you referred to, I suppose it >>> ends being account_page_dirtied(). See partial diff at the end. >> >> Well, even just eliminating the lock_page, page_mapping, PageDirty, >> etc is already a big win. >> >> If mapping->a_ops->set_page_dirty() needs to be called multiple times >> on the head page I'd probably just suggest: >> >> while (ntails--) >> rc |= (*spd)(head); > > I think once should be enough. There is no counter for page dirtiness, > and this kind of accounting is always tracked in the head page, so there > is no reason to repeatedly call set_page_dirty() from the same > spot. > I think that's what we do even today, considering the Dirty bit is only set on the compound head (regardless of accounting). Even without this patch, IIUC we don't call a second set_page_dirty(head) after the first time we dirty it. So probably there's no optimization to do here, as you say. Joao
diff --git a/mm/gup.c b/mm/gup.c index 194e6981eb03..3a9a7229f418 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -212,6 +212,18 @@ static bool __unpin_devmap_managed_user_page(struct page *page) } #endif /* CONFIG_DEV_PAGEMAP_OPS */ +static int record_refs(struct page **pages, int npages) +{ + struct page *head = compound_head(pages[0]); + int refs = 1, index; + + for (index = 1; index < npages; index++, refs++) + if (compound_head(pages[index]) != head) + break; + + return refs; +} + /** * unpin_user_page() - release a dma-pinned page * @page: pointer to page to be released @@ -221,9 +233,9 @@ static bool __unpin_devmap_managed_user_page(struct page *page) * that such pages can be separately tracked and uniquely handled. In * particular, interactions with RDMA and filesystems need special handling. */ -void unpin_user_page(struct page *page) +static void __unpin_user_page(struct page *page, int refs) { - int refs = 1; + int orig_refs = refs; page = compound_head(page); @@ -237,14 +249,19 @@ void unpin_user_page(struct page *page) return; if (hpage_pincount_available(page)) - hpage_pincount_sub(page, 1); + hpage_pincount_sub(page, refs); else - refs = GUP_PIN_COUNTING_BIAS; + refs *= GUP_PIN_COUNTING_BIAS; if (page_ref_sub_and_test(page, refs)) __put_page(page); - mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, 1); + mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, orig_refs); +} + +void unpin_user_page(struct page *page) +{ + __unpin_user_page(page, 1); } EXPORT_SYMBOL(unpin_user_page); @@ -274,6 +291,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, bool make_dirty) { unsigned long index; + int refs = 1; /* * TODO: this can be optimized for huge pages: if a series of pages is @@ -286,8 +304,9 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, return; } - for (index = 0; index < npages; index++) { + for (index = 0; index < npages; index += refs) { struct page *page = compound_head(pages[index]); + /* * Checking PageDirty at this point may race with * clear_page_dirty_for_io(), but that's OK. Two key @@ -310,7 +329,8 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, */ if (!PageDirty(page)) set_page_dirty_lock(page); - unpin_user_page(page); + refs = record_refs(pages + index, npages - index); + __unpin_user_page(page, refs); } } EXPORT_SYMBOL(unpin_user_pages_dirty_lock); @@ -327,6 +347,7 @@ EXPORT_SYMBOL(unpin_user_pages_dirty_lock); void unpin_user_pages(struct page **pages, unsigned long npages) { unsigned long index; + int refs = 1; /* * If this WARN_ON() fires, then the system *might* be leaking pages (by @@ -340,8 +361,10 @@ void unpin_user_pages(struct page **pages, unsigned long npages) * physically contiguous and part of the same compound page, then a * single operation to the head page should suffice. */ - for (index = 0; index < npages; index++) - unpin_user_page(pages[index]); + for (index = 0; index < npages; index += refs) { + refs = record_refs(pages + index, npages - index); + __unpin_user_page(pages[index], refs); + } } EXPORT_SYMBOL(unpin_user_pages);
Rather than decrementing the ref count one by one, we walk the page array and checking which belong to the same compound_head. Later on we decrement the calculated amount of references in a single write to the head page. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> --- mm/gup.c | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-)