Message ID | 20190927234008.11513-4-leonardo@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Introduces new count-based method for monitoring lockless pagetable walks | expand |
On Fri, Sep 27, 2019 at 08:40:00PM -0300, Leonardo Bras wrote:
> As decribed, gup_pgd_range is a lockless pagetable walk. So, in order to
^ typo
On Mon, 2019-09-30 at 14:09 +0300, Kirill A. Shutemov wrote: > On Fri, Sep 27, 2019 at 08:40:00PM -0300, Leonardo Bras wrote: > > As decribed, gup_pgd_range is a lockless pagetable walk. So, in order to > ^ typo > Fixed, thanks!
On 9/27/19 4:40 PM, Leonardo Bras wrote: > As decribed, gup_pgd_range is a lockless pagetable walk. So, in order to > monitor against THP split/collapse with the couting method, it's necessary s/couting/counting/ > to bound it with {start,end}_lockless_pgtbl_walk. > > There are dummy functions, so it is not going to add any overhead on archs > that don't use this method. > > Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> > --- > mm/gup.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/gup.c b/mm/gup.c > index 98f13ab37bac..7105c829cf44 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2325,6 +2325,7 @@ static bool gup_fast_permitted(unsigned long start, unsigned long end) > int __get_user_pages_fast(unsigned long start, int nr_pages, int write, > struct page **pages) > { > + struct mm_struct *mm; I don't think that this local variable adds any value, so let's not use it. Similar point in a few other patches too. > unsigned long len, end; > unsigned long flags; > int nr = 0; > @@ -2352,9 +2353,12 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, > > if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && > gup_fast_permitted(start, end)) { > + mm = current->mm; > + start_lockless_pgtbl_walk(mm); > local_irq_save(flags); > gup_pgd_range(start, end, write ? FOLL_WRITE : 0, pages, &nr); > local_irq_restore(flags); > + end_lockless_pgtbl_walk(mm); > } > > return nr; > @@ -2404,6 +2408,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, > unsigned int gup_flags, struct page **pages) > { > unsigned long addr, len, end; > + struct mm_struct *mm; Same here. > int nr = 0, ret = 0; > > if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM))) > @@ -2421,9 +2426,12 @@ int get_user_pages_fast(unsigned long start, int nr_pages, > > if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && > gup_fast_permitted(start, end)) { > + mm = current->mm; > + start_lockless_pgtbl_walk(mm); Minor: I'd like to rename this register_lockless_pgtable_walker(). > local_irq_disable(); > gup_pgd_range(addr, end, gup_flags, pages, &nr); > local_irq_enable(); > + end_lockless_pgtbl_walk(mm); ...and deregister_lockless_pgtable_walker(). thanks,
On Mon, 2019-09-30 at 14:51 -0700, John Hubbard wrote: > On 9/27/19 4:40 PM, Leonardo Bras wrote: > > As decribed, gup_pgd_range is a lockless pagetable walk. So, in order to > > monitor against THP split/collapse with the couting method, it's necessary > > s/couting/counting/ > Thanks, fixed for v5. > > to bound it with {start,end}_lockless_pgtbl_walk. > > > > There are dummy functions, so it is not going to add any overhead on archs > > that don't use this method. > > > > Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> > > --- > > mm/gup.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/mm/gup.c b/mm/gup.c > > index 98f13ab37bac..7105c829cf44 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -2325,6 +2325,7 @@ static bool gup_fast_permitted(unsigned long start, unsigned long end) > > int __get_user_pages_fast(unsigned long start, int nr_pages, int write, > > struct page **pages) > > { > > + struct mm_struct *mm; > > I don't think that this local variable adds any value, so let's not use it. > Similar point in a few other patches too. It avoids 1 deference of current->mm, it's a little performance gain. > > > unsigned long len, end; > > unsigned long flags; > > int nr = 0; > > @@ -2352,9 +2353,12 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, > > > > if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && > > gup_fast_permitted(start, end)) { > > + mm = current->mm; > > + start_lockless_pgtbl_walk(mm); > > local_irq_save(flags); > > gup_pgd_range(start, end, write ? FOLL_WRITE : 0, pages, &nr); > > local_irq_restore(flags); > > + end_lockless_pgtbl_walk(mm); > > } > > > > return nr; > > @@ -2404,6 +2408,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, > > unsigned int gup_flags, struct page **pages) > > { > > unsigned long addr, len, end; > > + struct mm_struct *mm; > > Same here. > > > int nr = 0, ret = 0; > > > > if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM))) > > @@ -2421,9 +2426,12 @@ int get_user_pages_fast(unsigned long start, int nr_pages, > > > > if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && > > gup_fast_permitted(start, end)) { > > + mm = current->mm; > > + start_lockless_pgtbl_walk(mm); > > Minor: I'd like to rename this register_lockless_pgtable_walker(). > > > local_irq_disable(); > > gup_pgd_range(addr, end, gup_flags, pages, &nr); > > local_irq_enable(); > > + end_lockless_pgtbl_walk(mm); > > ...and deregister_lockless_pgtable_walker(). > I have no problem changing the name, but I don't register/deregister are good terms for this. I would rather use start/finish, begin/end, and so on. Register sounds like something more complicated than what we are trying to achieve here. > > thanks, Thank you!
On 10/1/19 10:56 AM, Leonardo Bras wrote: > On Mon, 2019-09-30 at 14:51 -0700, John Hubbard wrote: >> On 9/27/19 4:40 PM, Leonardo Bras wrote: ... >>> diff --git a/mm/gup.c b/mm/gup.c >>> index 98f13ab37bac..7105c829cf44 100644 >>> --- a/mm/gup.c >>> +++ b/mm/gup.c >>> @@ -2325,6 +2325,7 @@ static bool gup_fast_permitted(unsigned long start, unsigned long end) >>> int __get_user_pages_fast(unsigned long start, int nr_pages, int write, >>> struct page **pages) >>> { >>> + struct mm_struct *mm; >> >> I don't think that this local variable adds any value, so let's not use it. >> Similar point in a few other patches too. > > It avoids 1 deference of current->mm, it's a little performance gain. > No, it isn't. :) Longer answer: at this level (by which I mean, "wrote the C code, haven't looked at the generated asm yet, and haven't done a direct perf test yet"), none of us C programmers are entitled to imagine that we can second guess both the compiler and the CPU well enough to claim that declaring a local pointer variable on the stack will even *affect* performance, much less know which way it will go! The compiler at -O2 will *absolutely* optimize away any local variables that it doesn't need. And that leads to how kernel programmers routinely decide about that kind of variable: "does the variable's added clarity compensate for the extra visual noise and for the need to manage the variable?" Here, and in most (all?) other points in the patchset where you've added an mm local variable, the answer is no. >> ... start_lockless_pgtbl_walk(mm); >> >> Minor: I'd like to rename this register_lockless_pgtable_walker(). >> >>> local_irq_disable(); >>> gup_pgd_range(addr, end, gup_flags, pages, &nr); >>> local_irq_enable(); >>> + end_lockless_pgtbl_walk(mm); >> >> ...and deregister_lockless_pgtable_walker(). >> > > I have no problem changing the name, but I don't register/deregister > are good terms for this. > > I would rather use start/finish, begin/end, and so on. Register sounds > like something more complicated than what we are trying to achieve > here. > OK, well, I don't want to bikeshed on naming more than I usually do, and what you have is reasonable, so I'll leave that alone. :) thanks,
On Tue, 2019-10-01 at 12:04 -0700, John Hubbard wrote: > On 10/1/19 10:56 AM, Leonardo Bras wrote: > > On Mon, 2019-09-30 at 14:51 -0700, John Hubbard wrote: > > > On 9/27/19 4:40 PM, Leonardo Bras wrote: > ... > > > > diff --git a/mm/gup.c b/mm/gup.c > > > > index 98f13ab37bac..7105c829cf44 100644 > > > > --- a/mm/gup.c > > > > +++ b/mm/gup.c > > > > @@ -2325,6 +2325,7 @@ static bool gup_fast_permitted(unsigned long start, unsigned long end) > > > > int __get_user_pages_fast(unsigned long start, int nr_pages, int write, > > > > struct page **pages) > > > > { > > > > + struct mm_struct *mm; > > > > > > I don't think that this local variable adds any value, so let's not use it. > > > Similar point in a few other patches too. > > > > It avoids 1 deference of current->mm, it's a little performance gain. > > > > No, it isn't. :) > > Longer answer: at this level (by which I mean, "wrote the C code, haven't looked > at the generated asm yet, and haven't done a direct perf test yet"), none of us > C programmers are entitled to imagine that we can second guess both the compiler > and the CPU well enough to claim that declaring a local pointer variable on the > stack will even *affect* performance, much less know which way it will go! > I did this based on how costly can be 'current', and I could notice reduction in assembly size most of the time. (powerpc) But I get what you mean, maybe the (possible) performance gain don't worth the extra work. > The compiler at -O2 will *absolutely* optimize away any local variables that > it doesn't need. > > And that leads to how kernel programmers routinely decide about that kind of > variable: "does the variable's added clarity compensate for the extra visual > noise and for the need to manage the variable?" That's a good way to decide it. :) > > Here, and in most (all?) other points in the patchset where you've added an > mm local variable, the answer is no. > Well, IMHO it's cleaner that way. But I get that other people may disagree. > > ... start_lockless_pgtbl_walk(mm); > > > Minor: I'd like to rename this register_lockless_pgtable_walker(). > > > > > > > local_irq_disable(); > > > > gup_pgd_range(addr, end, gup_flags, pages, &nr); > > > > local_irq_enable(); > > > > + end_lockless_pgtbl_walk(mm); > > > > > > ...and deregister_lockless_pgtable_walker(). > > > > > > > I have no problem changing the name, but I don't register/deregister > > are good terms for this. > > > > I would rather use start/finish, begin/end, and so on. Register sounds > > like something more complicated than what we are trying to achieve > > here. > > > > OK, well, I don't want to bikeshed on naming more than I usually do, and > what you have is reasonable, so I'll leave that alone. :) > > thanks, Thank for the feedback,
diff --git a/mm/gup.c b/mm/gup.c index 98f13ab37bac..7105c829cf44 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2325,6 +2325,7 @@ static bool gup_fast_permitted(unsigned long start, unsigned long end) int __get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages) { + struct mm_struct *mm; unsigned long len, end; unsigned long flags; int nr = 0; @@ -2352,9 +2353,12 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) { + mm = current->mm; + start_lockless_pgtbl_walk(mm); local_irq_save(flags); gup_pgd_range(start, end, write ? FOLL_WRITE : 0, pages, &nr); local_irq_restore(flags); + end_lockless_pgtbl_walk(mm); } return nr; @@ -2404,6 +2408,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages) { unsigned long addr, len, end; + struct mm_struct *mm; int nr = 0, ret = 0; if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM))) @@ -2421,9 +2426,12 @@ int get_user_pages_fast(unsigned long start, int nr_pages, if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) { + mm = current->mm; + start_lockless_pgtbl_walk(mm); local_irq_disable(); gup_pgd_range(addr, end, gup_flags, pages, &nr); local_irq_enable(); + end_lockless_pgtbl_walk(mm); ret = nr; }
As decribed, gup_pgd_range is a lockless pagetable walk. So, in order to monitor against THP split/collapse with the couting method, it's necessary to bound it with {start,end}_lockless_pgtbl_walk. There are dummy functions, so it is not going to add any overhead on archs that don't use this method. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> --- mm/gup.c | 8 ++++++++ 1 file changed, 8 insertions(+)