Message ID | 20230201081737.2330141-6-fengwei.yin@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | folio based filemap_map_pages() | expand |
On Wed, Feb 01, 2023 at 04:17:37PM +0800, Yin Fengwei wrote: > Use do_set_pte_range() in filemap_map_folio_range(). Which > batched updates mm counter, rmap. > > With a self cooked will-it-scale.page_fault3 like app (change > file write fault to read fault) got 15% performance gain. I'd suggest that you create a will-it-scale.page_fault4. Anton is quite open to adding new variations of tests. > Perf data collected before/after the change: > 18.73%--page_add_file_rmap > | > --11.60%--__mod_lruvec_page_state > | > |--7.40%--__mod_memcg_lruvec_state > | | > | --5.58%--cgroup_rstat_updated > | > --2.53%--__mod_lruvec_state > | > --1.48%--__mod_node_page_state > > 9.93%--page_add_file_rmap_range > | > --2.67%--__mod_lruvec_page_state > | > |--1.95%--__mod_memcg_lruvec_state > | | > | --1.57%--cgroup_rstat_updated > | > --0.61%--__mod_lruvec_state > | > --0.54%--__mod_node_page_state > > The running time of __mode_lruvec_page_state() is reduced a lot. Nice. > +++ b/mm/filemap.c > @@ -3364,11 +3364,22 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, > struct file *file = vma->vm_file; > struct page *page = folio_page(folio, start); > unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss); > - unsigned int ref_count = 0, count = 0; > + unsigned int ref_count = 0, count = 0, nr_mapped = 0; > > do { > - if (PageHWPoison(page)) > + if (PageHWPoison(page)) { > + if (nr_mapped) { > + vmf->pte -= nr_mapped; > + do_set_pte_range(vmf, folio, > + start + count - nr_mapped, > + addr - nr_mapped * PAGE_SIZE, > + nr_mapped); > + > + } > + > + nr_mapped = 0; > continue; > + } Having subtracted nr_mapped from vmf->pte, we then need to add it again. But this is all too complicated. What if we don't update vmf->pte each time around the loop? ie something like this: do { if (PageHWPoisoned(page)) goto map; if (mmap_miss > 0) mmap_miss--; /* * If there're PTE markers, we'll leave them to be handled * in the specific fault path, and it'll prohibit the * fault-around logic. */ if (!pte_none(vmf->pte[count])) goto map; if (vmf->address == addr + count * PAGE_SIZE) ret = VM_FAULT_NOPAGE; continue; map: if (count > 1) { do_set_pte_range(vmf, folio, start, addr, count - 1); folio_ref_add(folio, count - 1); } start += count; vmf->pte += count; addr += count * PAGE_SIZE; nr_pages -= count; count = 0; } while (page++, ++count < nr_pages); if (count > 0) { do_set_pte_range(vmf, folio, start, addr, count); folio_ref_add(folio, count); } else { /* Make sure the PTE points to the correct page table */ vmf->pte--; }
On 2/1/2023 11:50 PM, Matthew Wilcox wrote: > On Wed, Feb 01, 2023 at 04:17:37PM +0800, Yin Fengwei wrote: >> Use do_set_pte_range() in filemap_map_folio_range(). Which >> batched updates mm counter, rmap. >> >> With a self cooked will-it-scale.page_fault3 like app (change >> file write fault to read fault) got 15% performance gain. > > I'd suggest that you create a will-it-scale.page_fault4. Anton > is quite open to adding new variations of tests. OK. I will do this. > >> Perf data collected before/after the change: >> 18.73%--page_add_file_rmap >> | >> --11.60%--__mod_lruvec_page_state >> | >> |--7.40%--__mod_memcg_lruvec_state >> | | >> | --5.58%--cgroup_rstat_updated >> | >> --2.53%--__mod_lruvec_state >> | >> --1.48%--__mod_node_page_state >> >> 9.93%--page_add_file_rmap_range >> | >> --2.67%--__mod_lruvec_page_state >> | >> |--1.95%--__mod_memcg_lruvec_state >> | | >> | --1.57%--cgroup_rstat_updated >> | >> --0.61%--__mod_lruvec_state >> | >> --0.54%--__mod_node_page_state >> >> The running time of __mode_lruvec_page_state() is reduced a lot. > > Nice. > >> +++ b/mm/filemap.c >> @@ -3364,11 +3364,22 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, >> struct file *file = vma->vm_file; >> struct page *page = folio_page(folio, start); >> unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss); >> - unsigned int ref_count = 0, count = 0; >> + unsigned int ref_count = 0, count = 0, nr_mapped = 0; >> >> do { >> - if (PageHWPoison(page)) >> + if (PageHWPoison(page)) { >> + if (nr_mapped) { >> + vmf->pte -= nr_mapped; >> + do_set_pte_range(vmf, folio, >> + start + count - nr_mapped, >> + addr - nr_mapped * PAGE_SIZE, >> + nr_mapped); >> + >> + } >> + >> + nr_mapped = 0; >> continue; >> + } > > Having subtracted nr_mapped from vmf->pte, we then need to add it again. > > But this is all too complicated. What if we don't update vmf->pte > each time around the loop? ie something like this: Thanks a lot for the suggestion. I like vmf->pte[count] a lot. Will update the code accordingly in next version. Regards Yin, Fengwei > > do { > if (PageHWPoisoned(page)) > goto map; > if (mmap_miss > 0) > mmap_miss--; > /* > * If there're PTE markers, we'll leave them to be handled > * in the specific fault path, and it'll prohibit the > * fault-around logic. > */ > if (!pte_none(vmf->pte[count])) > goto map; > if (vmf->address == addr + count * PAGE_SIZE) > ret = VM_FAULT_NOPAGE; > continue; > map: > if (count > 1) { > do_set_pte_range(vmf, folio, start, addr, count - 1); > folio_ref_add(folio, count - 1); > } > start += count; > vmf->pte += count; > addr += count * PAGE_SIZE; > nr_pages -= count; > count = 0; > } while (page++, ++count < nr_pages); > > if (count > 0) { > do_set_pte_range(vmf, folio, start, addr, count); > folio_ref_add(folio, count); > } else { > /* Make sure the PTE points to the correct page table */ > vmf->pte--; > }
On 2/1/2023 11:50 PM, Matthew Wilcox wrote: > On Wed, Feb 01, 2023 at 04:17:37PM +0800, Yin Fengwei wrote: >> Use do_set_pte_range() in filemap_map_folio_range(). Which >> batched updates mm counter, rmap. >> >> With a self cooked will-it-scale.page_fault3 like app (change >> file write fault to read fault) got 15% performance gain. > > I'd suggest that you create a will-it-scale.page_fault4. Anton > is quite open to adding new variations of tests. > >> Perf data collected before/after the change: >> 18.73%--page_add_file_rmap >> | >> --11.60%--__mod_lruvec_page_state >> | >> |--7.40%--__mod_memcg_lruvec_state >> | | >> | --5.58%--cgroup_rstat_updated >> | >> --2.53%--__mod_lruvec_state >> | >> --1.48%--__mod_node_page_state >> >> 9.93%--page_add_file_rmap_range >> | >> --2.67%--__mod_lruvec_page_state >> | >> |--1.95%--__mod_memcg_lruvec_state >> | | >> | --1.57%--cgroup_rstat_updated >> | >> --0.61%--__mod_lruvec_state >> | >> --0.54%--__mod_node_page_state >> >> The running time of __mode_lruvec_page_state() is reduced a lot. > > Nice. > >> +++ b/mm/filemap.c >> @@ -3364,11 +3364,22 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, >> struct file *file = vma->vm_file; >> struct page *page = folio_page(folio, start); >> unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss); >> - unsigned int ref_count = 0, count = 0; >> + unsigned int ref_count = 0, count = 0, nr_mapped = 0; >> >> do { >> - if (PageHWPoison(page)) >> + if (PageHWPoison(page)) { >> + if (nr_mapped) { >> + vmf->pte -= nr_mapped; >> + do_set_pte_range(vmf, folio, >> + start + count - nr_mapped, >> + addr - nr_mapped * PAGE_SIZE, >> + nr_mapped); >> + >> + } >> + >> + nr_mapped = 0; >> continue; >> + } > > Having subtracted nr_mapped from vmf->pte, we then need to add it again. > > But this is all too complicated. What if we don't update vmf->pte > each time around the loop? ie something like this: > > do { > if (PageHWPoisoned(page)) > goto map; > if (mmap_miss > 0) > mmap_miss--; > /* > * If there're PTE markers, we'll leave them to be handled > * in the specific fault path, and it'll prohibit the > * fault-around logic. > */ > if (!pte_none(vmf->pte[count])) > goto map; > if (vmf->address == addr + count * PAGE_SIZE) > ret = VM_FAULT_NOPAGE; > continue; > map: > if (count > 1) { > do_set_pte_range(vmf, folio, start, addr, count - 1); > folio_ref_add(folio, count - 1); > } > start += count; > vmf->pte += count; > addr += count * PAGE_SIZE; > nr_pages -= count; > count = 0; I found it's not easy to make this part correct. So I tried to simplify the logic here a little bit. Regards Yin, Fengwei > } while (page++, ++count < nr_pages); > > if (count > 0) { > do_set_pte_range(vmf, folio, start, addr, count); > folio_ref_add(folio, count); > } else { > /* Make sure the PTE points to the correct page table */ > vmf->pte--; > }
diff --git a/mm/filemap.c b/mm/filemap.c index 95f634d11581..b14f077d1c55 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3364,11 +3364,22 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, struct file *file = vma->vm_file; struct page *page = folio_page(folio, start); unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss); - unsigned int ref_count = 0, count = 0; + unsigned int ref_count = 0, count = 0, nr_mapped = 0; do { - if (PageHWPoison(page)) + if (PageHWPoison(page)) { + if (nr_mapped) { + vmf->pte -= nr_mapped; + do_set_pte_range(vmf, folio, + start + count - nr_mapped, + addr - nr_mapped * PAGE_SIZE, + nr_mapped); + + } + + nr_mapped = 0; continue; + } if (mmap_miss > 0) mmap_miss--; @@ -3378,16 +3389,34 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, * handled in the specific fault path, and it'll prohibit the * fault-around logic. */ - if (!pte_none(*vmf->pte)) + if (!pte_none(*vmf->pte)) { + if (nr_mapped) { + vmf->pte -= nr_mapped; + do_set_pte_range(vmf, folio, + start + count - nr_mapped, + addr - nr_mapped * PAGE_SIZE, + nr_mapped); + + } + + nr_mapped = 0; + continue; + } if (vmf->address == addr) ret = VM_FAULT_NOPAGE; ref_count++; - do_set_pte(vmf, page, addr); + nr_mapped++; } while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages); + if (nr_mapped) { + vmf->pte -= nr_mapped; + do_set_pte_range(vmf, folio, start + count - nr_mapped, + addr - nr_mapped * PAGE_SIZE, nr_mapped); + } + /* * Restore the vmf->pte. Otherwise, it's possible vmf->pte point * to next page table entry if the last sub page in the range is
Use do_set_pte_range() in filemap_map_folio_range(). Which batched updates mm counter, rmap. With a self cooked will-it-scale.page_fault3 like app (change file write fault to read fault) got 15% performance gain. Perf data collected before/after the change: 18.73%--page_add_file_rmap | --11.60%--__mod_lruvec_page_state | |--7.40%--__mod_memcg_lruvec_state | | | --5.58%--cgroup_rstat_updated | --2.53%--__mod_lruvec_state | --1.48%--__mod_node_page_state 9.93%--page_add_file_rmap_range | --2.67%--__mod_lruvec_page_state | |--1.95%--__mod_memcg_lruvec_state | | | --1.57%--cgroup_rstat_updated | --0.61%--__mod_lruvec_state | --0.54%--__mod_node_page_state The running time of __mode_lruvec_page_state() is reduced a lot. Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> --- mm/filemap.c | 37 +++++++++++++++++++++++++++++++++---- 1 file changed, 33 insertions(+), 4 deletions(-)