Message ID | 20231213072805.74201-1-jianfeng.w.wang@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: remove redundant lru_add_drain() prior to unmapping pages | expand |
On Tue, 2023-12-12 at 23:28 -0800, Jianfeng Wang wrote: > When unmapping VMA pages, pages will be gathered in batch and released by > tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function > tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), > which calls lru_add_drain() to drain cached pages in folio_batch before > releasing gathered pages. Thus, it is redundant to call lru_add_drain() > before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. > > Remove lru_add_drain() prior to gathering and unmapping pages in > exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. > > Note that the page unmapping process in oom_killer (e.g., in > __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have > redundant lru_add_drain(). So, this commit makes the code more consistent. > > Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> > --- > mm/mmap.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 1971bfffcc03..0451285dee4f 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, > struct mmu_gather tlb; > unsigned long mt_start = mas->index; > > +#ifdef CONFIG_MMU_GATHER_NO_GATHER In your comment you say skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER is *not* set. So shouldn't this be #ifndef CONFIG_MMU_GATHER_NO_GATHER ? > lru_add_drain(); > +#endif > tlb_gather_mmu(&tlb, mm); > update_hiwater_rss(mm); > unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); > @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm) > return; > } > > +#ifdef CONFIG_MMU_GATHER_NO_GATHER same question as above. > lru_add_drain(); > +#endif > flush_cache_mm(mm); > tlb_gather_mmu_fullmm(&tlb, mm); > /* update_hiwater_rss(mm) here? but nobody should be looking */
On 12/13/23 2:57 PM, Tim Chen wrote: > On Tue, 2023-12-12 at 23:28 -0800, Jianfeng Wang wrote: >> When unmapping VMA pages, pages will be gathered in batch and released by >> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function >> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), >> which calls lru_add_drain() to drain cached pages in folio_batch before >> releasing gathered pages. Thus, it is redundant to call lru_add_drain() >> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Remove lru_add_drain() prior to gathering and unmapping pages in >> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Note that the page unmapping process in oom_killer (e.g., in >> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have >> redundant lru_add_drain(). So, this commit makes the code more consistent. >> >> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> >> --- >> mm/mmap.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 1971bfffcc03..0451285dee4f 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, >> struct mmu_gather tlb; >> unsigned long mt_start = mas->index; >> >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER > > In your comment you say skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER > is *not* set. So shouldn't this be > > #ifndef CONFIG_MMU_GATHER_NO_GATHER ? > Hi Tim, The mmu_gather feature is used to gather pages produced by unmap_vmas() and release them in batch in tlb_finish_mmu(). The feature is *on* if CONFIG_MMU_GATHER_NO_GATHER is *not* set. Note that: tlb_finish_mmu() will call free_pages_and_swap_cache()/lru_add_drain() only when the feature is on. Yes, this commit aims to skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER is *not* set (i.e. when the mmu_gather feature is on) because it is redundant. If CONFIG_MMU_GATHER_NO_GATHER is set, pages will be released in unmap_vmas(). tlb_finish_mmu() will not call lru_add_drain(). So, it is still necessary to keep the lru_add_drain() call to clear cached pages before unmap_vmas(), as folio_batchs hold a reference count for pages in them. The same applies to the other case. Thanks, - Jianfeng >> lru_add_drain(); >> +#endif >> tlb_gather_mmu(&tlb, mm); >> update_hiwater_rss(mm); >> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); >> @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm) >> return; >> } >> >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER > > same question as above. > >> lru_add_drain(); >> +#endif >> flush_cache_mm(mm); >> tlb_gather_mmu_fullmm(&tlb, mm); >> /* update_hiwater_rss(mm) here? but nobody should be looking */ >
On Wed, 2023-12-13 at 17:03 -0800, Jianfeng Wang wrote: > On 12/13/23 2:57 PM, Tim Chen wrote: > > On Tue, 2023-12-12 at 23:28 -0800, Jianfeng Wang wrote: > > > When unmapping VMA pages, pages will be gathered in batch and released by > > > tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function > > > tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), > > > which calls lru_add_drain() to drain cached pages in folio_batch before > > > releasing gathered pages. Thus, it is redundant to call lru_add_drain() > > > before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. > > > > > > Remove lru_add_drain() prior to gathering and unmapping pages in > > > exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. > > > > > > Note that the page unmapping process in oom_killer (e.g., in > > > __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have > > > redundant lru_add_drain(). So, this commit makes the code more consistent. > > > > > > Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> > > > --- > > > mm/mmap.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index 1971bfffcc03..0451285dee4f 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, > > > struct mmu_gather tlb; > > > unsigned long mt_start = mas->index; > > > > > > +#ifdef CONFIG_MMU_GATHER_NO_GATHER > > > > In your comment you say skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER > > is *not* set. So shouldn't this be > > > > #ifndef CONFIG_MMU_GATHER_NO_GATHER ? > > > Hi Tim, > > The mmu_gather feature is used to gather pages produced by unmap_vmas() and > release them in batch in tlb_finish_mmu(). The feature is *on* if > CONFIG_MMU_GATHER_NO_GATHER is *not* set. Note that: tlb_finish_mmu() will call > free_pages_and_swap_cache()/lru_add_drain() only when the feature is on. Thanks for the explanation. Looking at the code, lru_add_drain() is executed for #ifndef CONFIG_MMU_GATHER_NO_GATHER in tlb_finish_mmu(). So the logic of your patch is fine. The #ifndef CONFIG_MMU_GATHER_NO_GATHER means mmu_gather feature is on. The double negative throws me off on in my first read of your commit log. Suggest that you add a comment in code to make it easier for future code maintenence: /* defer lru_add_drain() to tlb_finish_mmu() for ifndef CONFIG_MMU_GATHER_NO_GATHER */ Is your change of skipping the extra lru_add_drain() motivated by some performance reason in a workload? Wonder whether it is worth adding an extra ifdef in the code. Tim > > Yes, this commit aims to skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER > is *not* set (i.e. when the mmu_gather feature is on) because it is redundant. > > If CONFIG_MMU_GATHER_NO_GATHER is set, pages will be released in unmap_vmas(). > tlb_finish_mmu() will not call lru_add_drain(). So, it is still necessary to > keep the lru_add_drain() call to clear cached pages before unmap_vmas(), as > folio_batchs hold a reference count for pages in them. > > The same applies to the other case. > > Thanks, > - Jianfeng > > > > lru_add_drain(); > > > +#endif > > > tlb_gather_mmu(&tlb, mm); > > > update_hiwater_rss(mm); > > > unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); > > > @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm) > > > return; > > > } > > > > > > +#ifdef CONFIG_MMU_GATHER_NO_GATHER > > > > same question as above. > > > > > lru_add_drain(); > > > +#endif > > > flush_cache_mm(mm); > > > tlb_gather_mmu_fullmm(&tlb, mm); > > > /* update_hiwater_rss(mm) here? but nobody should be looking */ > >
On 12/14/23 9:57 AM, Tim Chen wrote: > On Wed, 2023-12-13 at 17:03 -0800, Jianfeng Wang wrote: >> On 12/13/23 2:57 PM, Tim Chen wrote: >>> On Tue, 2023-12-12 at 23:28 -0800, Jianfeng Wang wrote: >>>> When unmapping VMA pages, pages will be gathered in batch and released by >>>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function >>>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), >>>> which calls lru_add_drain() to drain cached pages in folio_batch before >>>> releasing gathered pages. Thus, it is redundant to call lru_add_drain() >>>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. >>>> >>>> Remove lru_add_drain() prior to gathering and unmapping pages in >>>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. >>>> >>>> Note that the page unmapping process in oom_killer (e.g., in >>>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have >>>> redundant lru_add_drain(). So, this commit makes the code more consistent. >>>> >>>> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> >>>> --- >>>> mm/mmap.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/mm/mmap.c b/mm/mmap.c >>>> index 1971bfffcc03..0451285dee4f 100644 >>>> --- a/mm/mmap.c >>>> +++ b/mm/mmap.c >>>> @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, >>>> struct mmu_gather tlb; >>>> unsigned long mt_start = mas->index; >>>> >>>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >>> >>> In your comment you say skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER >>> is *not* set. So shouldn't this be >>> >>> #ifndef CONFIG_MMU_GATHER_NO_GATHER ? >>> >> Hi Tim, >> >> The mmu_gather feature is used to gather pages produced by unmap_vmas() and >> release them in batch in tlb_finish_mmu(). The feature is *on* if >> CONFIG_MMU_GATHER_NO_GATHER is *not* set. Note that: tlb_finish_mmu() will call >> free_pages_and_swap_cache()/lru_add_drain() only when the feature is on. > > Thanks for the explanation. > > Looking at the code, lru_add_drain() is executed for #ifndef CONFIG_MMU_GATHER_NO_GATHER > in tlb_finish_mmu(). So the logic of your patch is fine. > > The #ifndef CONFIG_MMU_GATHER_NO_GATHER means > mmu_gather feature is on. The double negative throws me off on in my first read > of your commit log. > > Suggest that you add a comment in code to make it easier for > future code maintenence: > > /* defer lru_add_drain() to tlb_finish_mmu() for ifndef CONFIG_MMU_GATHER_NO_GATHER */ > > Is your change of skipping the extra lru_add_drain() motivated by some performance reason > in a workload? Wonder whether it is worth adding an extra ifdef in the code. > > Tim > Okay, great suggestion. We observe heavy contention on the LRU lock, introduced by lru_add_drain() and release_pages() for a prod workload, and we're trying to reduce the level of contention. lru_add_drain() is a complex function that first takes a local CPU lock and iterate through *all* folio_batches to see if there are pages to be moved to and between LRU lists. At that point, any page in these folio_batches will trigger acquiring the per-LRU spin lock and increase the level of lock contention. Applying the change can avoid calling lru_add_drain() unnecessarily, which is a source of lock contention. Together with the comment line suggested by you, I believe this also increases code readability to clarify the mmu_gather feature. - Jianfeng >> >> Yes, this commit aims to skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER >> is *not* set (i.e. when the mmu_gather feature is on) because it is redundant. >> >> If CONFIG_MMU_GATHER_NO_GATHER is set, pages will be released in unmap_vmas(). >> tlb_finish_mmu() will not call lru_add_drain(). So, it is still necessary to >> keep the lru_add_drain() call to clear cached pages before unmap_vmas(), as >> folio_batchs hold a reference count for pages in them. >> >> The same applies to the other case. >> >> Thanks, >> - Jianfeng >> >>>> lru_add_drain(); >>>> +#endif >>>> tlb_gather_mmu(&tlb, mm); >>>> update_hiwater_rss(mm); >>>> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); >>>> @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm) >>>> return; >>>> } >>>> >>>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >>> >>> same question as above. >>> >>>> lru_add_drain(); >>>> +#endif >>>> flush_cache_mm(mm); >>>> tlb_gather_mmu_fullmm(&tlb, mm); >>>> /* update_hiwater_rss(mm) here? but nobody should be looking */ >>> >
diff --git a/mm/mmap.c b/mm/mmap.c index 1971bfffcc03..0451285dee4f 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, struct mmu_gather tlb; unsigned long mt_start = mas->index; +#ifdef CONFIG_MMU_GATHER_NO_GATHER lru_add_drain(); +#endif tlb_gather_mmu(&tlb, mm); update_hiwater_rss(mm); unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm) return; } +#ifdef CONFIG_MMU_GATHER_NO_GATHER lru_add_drain(); +#endif flush_cache_mm(mm); tlb_gather_mmu_fullmm(&tlb, mm); /* update_hiwater_rss(mm) here? but nobody should be looking */
When unmapping VMA pages, pages will be gathered in batch and released by tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), which calls lru_add_drain() to drain cached pages in folio_batch before releasing gathered pages. Thus, it is redundant to call lru_add_drain() before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. Remove lru_add_drain() prior to gathering and unmapping pages in exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. Note that the page unmapping process in oom_killer (e.g., in __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have redundant lru_add_drain(). So, this commit makes the code more consistent. Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> --- mm/mmap.c | 4 ++++ 1 file changed, 4 insertions(+)