[3/7] khugepaged: Drain LRU add pagevec to get rid of extra pins
diff mbox series

Message ID 20200327170601.18563-4-kirill.shutemov@linux.intel.com
State New
Headers show
Series
  • thp/khugepaged improvements and CoW semantics
Related show

Commit Message

Kirill A. Shutemov March 27, 2020, 5:05 p.m. UTC
__collapse_huge_page_isolate() may fail due to extra pin in the LRU add
pagevec. It's petty common for swapin case: we swap in pages just to
fail due to the extra pin.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/khugepaged.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Zi Yan March 27, 2020, 5:34 p.m. UTC | #1
On 27 Mar 2020, at 13:05, Kirill A. Shutemov wrote:

> __collapse_huge_page_isolate() may fail due to extra pin in the LRU add
> pagevec. It's petty common for swapin case: we swap in pages just to
> fail due to the extra pin.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/khugepaged.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 14d7afc90786..39e0994abeb8 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -585,11 +585,19 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  		 * The page must only be referenced by the scanned process
>  		 * and page swap cache.
>  		 */
> +		if (page_count(page) != 1 + PageSwapCache(page)) {
> +			/*
> +			 * Drain pagevec and retry just in case we can get rid
> +			 * of the extra pin, like in swapin case.
> +			 */
> +			lru_add_drain();
> +		}
>  		if (page_count(page) != 1 + PageSwapCache(page)) {
>  			unlock_page(page);
>  			result = SCAN_PAGE_COUNT;
>  			goto out;
>  		}
> +
>  		if (pte_write(pteval)) {
>  			writable = true;
>  		} else {
> -- 
> 2.26.0

Looks good to me. Is the added empty line intentional?

Reviewed-by: Zi Yan <ziy@nvidia.com>

—
Best Regards,
Yan Zi
Yang Shi March 27, 2020, 6:10 p.m. UTC | #2
On Fri, Mar 27, 2020 at 10:06 AM Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> __collapse_huge_page_isolate() may fail due to extra pin in the LRU add
> pagevec. It's petty common for swapin case: we swap in pages just to
> fail due to the extra pin.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/khugepaged.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 14d7afc90786..39e0994abeb8 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -585,11 +585,19 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>                  * The page must only be referenced by the scanned process
>                  * and page swap cache.
>                  */
> +               if (page_count(page) != 1 + PageSwapCache(page)) {
> +                       /*
> +                        * Drain pagevec and retry just in case we can get rid
> +                        * of the extra pin, like in swapin case.
> +                        */
> +                       lru_add_drain();

This is definitely correct.

I'm wondering if we need one more lru_add_drain() before PageLRU check
in khugepaged_scan_pmd() or not? The page might be in lru cache then
get skipped. This would improve the success rate.

Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>

> +               }
>                 if (page_count(page) != 1 + PageSwapCache(page)) {
>                         unlock_page(page);
>                         result = SCAN_PAGE_COUNT;
>                         goto out;
>                 }
> +
>                 if (pte_write(pteval)) {
>                         writable = true;
>                 } else {
> --
> 2.26.0
>
>
Kirill A. Shutemov March 28, 2020, 12:20 a.m. UTC | #3
On Fri, Mar 27, 2020 at 01:34:20PM -0400, Zi Yan wrote:
> On 27 Mar 2020, at 13:05, Kirill A. Shutemov wrote:
> 
> > __collapse_huge_page_isolate() may fail due to extra pin in the LRU add
> > pagevec. It's petty common for swapin case: we swap in pages just to
> > fail due to the extra pin.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  mm/khugepaged.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 14d7afc90786..39e0994abeb8 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -585,11 +585,19 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >  		 * The page must only be referenced by the scanned process
> >  		 * and page swap cache.
> >  		 */
> > +		if (page_count(page) != 1 + PageSwapCache(page)) {
> > +			/*
> > +			 * Drain pagevec and retry just in case we can get rid
> > +			 * of the extra pin, like in swapin case.
> > +			 */
> > +			lru_add_drain();
> > +		}
> >  		if (page_count(page) != 1 + PageSwapCache(page)) {
> >  			unlock_page(page);
> >  			result = SCAN_PAGE_COUNT;
> >  			goto out;
> >  		}
> > +
> >  		if (pte_write(pteval)) {
> >  			writable = true;
> >  		} else {
> > -- 
> > 2.26.0
> 
> Looks good to me. Is the added empty line intentional?

Yes. It groups try and retry together.
Kirill A. Shutemov March 28, 2020, 12:18 p.m. UTC | #4
On Fri, Mar 27, 2020 at 11:10:40AM -0700, Yang Shi wrote:
> On Fri, Mar 27, 2020 at 10:06 AM Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
> >
> > __collapse_huge_page_isolate() may fail due to extra pin in the LRU add
> > pagevec. It's petty common for swapin case: we swap in pages just to
> > fail due to the extra pin.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  mm/khugepaged.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 14d7afc90786..39e0994abeb8 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -585,11 +585,19 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >                  * The page must only be referenced by the scanned process
> >                  * and page swap cache.
> >                  */
> > +               if (page_count(page) != 1 + PageSwapCache(page)) {
> > +                       /*
> > +                        * Drain pagevec and retry just in case we can get rid
> > +                        * of the extra pin, like in swapin case.
> > +                        */
> > +                       lru_add_drain();
> 
> This is definitely correct.
> 
> I'm wondering if we need one more lru_add_drain() before PageLRU check
> in khugepaged_scan_pmd() or not? The page might be in lru cache then
> get skipped. This would improve the success rate.

Could you elaborate on the scenario, I don't follow.
Yang Shi March 30, 2020, 6:30 p.m. UTC | #5
On Sat, Mar 28, 2020 at 5:18 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Fri, Mar 27, 2020 at 11:10:40AM -0700, Yang Shi wrote:
> > On Fri, Mar 27, 2020 at 10:06 AM Kirill A. Shutemov
> > <kirill@shutemov.name> wrote:
> > >
> > > __collapse_huge_page_isolate() may fail due to extra pin in the LRU add
> > > pagevec. It's petty common for swapin case: we swap in pages just to
> > > fail due to the extra pin.
> > >
> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > ---
> > >  mm/khugepaged.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index 14d7afc90786..39e0994abeb8 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -585,11 +585,19 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> > >                  * The page must only be referenced by the scanned process
> > >                  * and page swap cache.
> > >                  */
> > > +               if (page_count(page) != 1 + PageSwapCache(page)) {
> > > +                       /*
> > > +                        * Drain pagevec and retry just in case we can get rid
> > > +                        * of the extra pin, like in swapin case.
> > > +                        */
> > > +                       lru_add_drain();
> >
> > This is definitely correct.
> >
> > I'm wondering if we need one more lru_add_drain() before PageLRU check
> > in khugepaged_scan_pmd() or not? The page might be in lru cache then
> > get skipped. This would improve the success rate.
>
> Could you elaborate on the scenario, I don't follow.

I mean the below change:

--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1195,6 +1195,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
                        goto out_unmap;
                }
                khugepaged_node_load[node]++;
+               if (!PageLRU(page))
+                       /* Flush the page out of lru cache */
+                       lru_add_drain();
                if (!PageLRU(page)) {
                        result = SCAN_PAGE_LRU;
                        goto out_unmap;

If the page is not on LRU we even can't reach collapse_huge_page(), right?

>
>
> --
>  Kirill A. Shutemov
Kirill A. Shutemov March 30, 2020, 9:38 p.m. UTC | #6
On Mon, Mar 30, 2020 at 11:30:14AM -0700, Yang Shi wrote:
> On Sat, Mar 28, 2020 at 5:18 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >
> > On Fri, Mar 27, 2020 at 11:10:40AM -0700, Yang Shi wrote:
> > > On Fri, Mar 27, 2020 at 10:06 AM Kirill A. Shutemov
> > > <kirill@shutemov.name> wrote:
> > > >
> > > > __collapse_huge_page_isolate() may fail due to extra pin in the LRU add
> > > > pagevec. It's petty common for swapin case: we swap in pages just to
> > > > fail due to the extra pin.
> > > >
> > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > ---
> > > >  mm/khugepaged.c | 8 ++++++++
> > > >  1 file changed, 8 insertions(+)
> > > >
> > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > index 14d7afc90786..39e0994abeb8 100644
> > > > --- a/mm/khugepaged.c
> > > > +++ b/mm/khugepaged.c
> > > > @@ -585,11 +585,19 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> > > >                  * The page must only be referenced by the scanned process
> > > >                  * and page swap cache.
> > > >                  */
> > > > +               if (page_count(page) != 1 + PageSwapCache(page)) {
> > > > +                       /*
> > > > +                        * Drain pagevec and retry just in case we can get rid
> > > > +                        * of the extra pin, like in swapin case.
> > > > +                        */
> > > > +                       lru_add_drain();
> > >
> > > This is definitely correct.
> > >
> > > I'm wondering if we need one more lru_add_drain() before PageLRU check
> > > in khugepaged_scan_pmd() or not? The page might be in lru cache then
> > > get skipped. This would improve the success rate.
> >
> > Could you elaborate on the scenario, I don't follow.
> 
> I mean the below change:
> 
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1195,6 +1195,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
>                         goto out_unmap;
>                 }
>                 khugepaged_node_load[node]++;
> +               if (!PageLRU(page))
> +                       /* Flush the page out of lru cache */
> +                       lru_add_drain();
>                 if (!PageLRU(page)) {
>                         result = SCAN_PAGE_LRU;
>                         goto out_unmap;
> 
> If the page is not on LRU we even can't reach collapse_huge_page(), right?

Yeah, I've archived the same by doing lru_add_drain_all() once per
khugepaged_do_scan(). It is more effective than lru_add_drain() inside
khugepaged_scan_pmd() and should have too much overhead.

The lru_add_drain() from this patch moved into swapin routine and called
only on success.

Patch
diff mbox series

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 14d7afc90786..39e0994abeb8 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -585,11 +585,19 @@  static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		 * The page must only be referenced by the scanned process
 		 * and page swap cache.
 		 */
+		if (page_count(page) != 1 + PageSwapCache(page)) {
+			/*
+			 * Drain pagevec and retry just in case we can get rid
+			 * of the extra pin, like in swapin case.
+			 */
+			lru_add_drain();
+		}
 		if (page_count(page) != 1 + PageSwapCache(page)) {
 			unlock_page(page);
 			result = SCAN_PAGE_COUNT;
 			goto out;
 		}
+
 		if (pte_write(pteval)) {
 			writable = true;
 		} else {