Message ID | 20200206030900.147032-11-leonardo@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Introduces new functions for tracking lockless pagetable walks | expand |
Le 06/02/2020 à 04:08, Leonardo Bras a écrit : > Implements an additional feature to track lockless pagetable walks, > using a per-cpu counter: lockless_pgtbl_walk_counter. > > Before a lockless pagetable walk, preemption is disabled and the > current cpu's counter is increased. > When the lockless pagetable walk finishes, the current cpu counter > is decreased and the preemption is enabled. > > With that, it's possible to know in which cpus are happening lockless > pagetable walks, and optimize serialize_against_pte_lookup(). > > Implementation notes: > - Every counter can be changed only by it's CPU > - It makes use of the original memory barrier in the functions > - Any counter can be read by any CPU > > Due to not locking nor using atomic variables, the impact on the > lockless pagetable walk is intended to be minimum. atomic variables have a lot less impact than preempt_enable/disable. preemt_disable forces a re-scheduling, it really has impact. Why not use atomic variables instead ? Christophe > > Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> > --- > arch/powerpc/mm/book3s64/pgtable.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c > index 535613030363..bb138b628f86 100644 > --- a/arch/powerpc/mm/book3s64/pgtable.c > +++ b/arch/powerpc/mm/book3s64/pgtable.c > @@ -83,6 +83,7 @@ static void do_nothing(void *unused) > > } > > +static DEFINE_PER_CPU(int, lockless_pgtbl_walk_counter); > /* > * Serialize against find_current_mm_pte which does lock-less > * lookup in page tables with local interrupts disabled. For huge pages > @@ -120,6 +121,15 @@ unsigned long __begin_lockless_pgtbl_walk(bool disable_irq) > if (disable_irq) > local_irq_save(irq_mask); > > + /* > + * Counts this instance of lockless pagetable walk for this cpu. > + * Disables preempt to make sure there is no cpu change between > + * begin/end lockless pagetable walk, so that percpu counting > + * works fine. > + */ > + preempt_disable(); > + (*this_cpu_ptr(&lockless_pgtbl_walk_counter))++; > + > /* > * This memory barrier pairs with any code that is either trying to > * delete page tables, or split huge pages. Without this barrier, > @@ -158,6 +168,14 @@ inline void __end_lockless_pgtbl_walk(unsigned long irq_mask, bool enable_irq) > */ > smp_mb(); > > + /* > + * Removes this instance of lockless pagetable walk for this cpu. > + * Enables preempt only after end lockless pagetable walk, > + * so that percpu counting works fine. > + */ > + (*this_cpu_ptr(&lockless_pgtbl_walk_counter))--; > + preempt_enable(); > + > /* > * Interrupts must be disabled during the lockless page table walk. > * That's because the deleting or splitting involves flushing TLBs, >
Hello Christophe, thanks for the feedback! On Thu, 2020-02-06 at 07:23 +0100, Christophe Leroy wrote: > > Due to not locking nor using atomic variables, the impact on the > > lockless pagetable walk is intended to be minimum. > > atomic variables have a lot less impact than preempt_enable/disable. > > preemt_disable forces a re-scheduling, it really has impact. Why not use > atomic variables instead ? In fact, v5 of this patch used atomic variables. But it seems to cause contention on a single exclusive cacheline, which had no better performance than locking. (discussion here: http://patchwork.ozlabs.org/patch/1171012/) When I try to understand the effect of preempt_disable(), all I can see is a barrier() and possibly a preempt_count_inc(), which updates a member of current thread struct if CONFIG_PREEMPT_COUNT is enabled. If CONFIG_PREEMPTION is also enabled, preempt_enable() can run a __preempt_schedule() on unlikely(__preempt_count_dec_and_test()). On most configs available, CONFIG_PREEMPTION is not set, being replaced either by CONFIG_PREEMPT_NONE (kernel defconfigs) or CONFIG_PREEMPT_VOLUNTARY in most supported distros. With that, most probably CONFIG_PREEMPT_COUNT will also not be set, and preempt_{en,dis}able() are replaced by a barrier(). Using preempt_disable approach, I intent to get better performance for most used cases. What do you think of it? I am still new on this subject, and I am still trying to better understand how it works. If you notice something I am missing, please let me know. Best regards, Leonardo Bras
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 535613030363..bb138b628f86 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -83,6 +83,7 @@ static void do_nothing(void *unused) } +static DEFINE_PER_CPU(int, lockless_pgtbl_walk_counter); /* * Serialize against find_current_mm_pte which does lock-less * lookup in page tables with local interrupts disabled. For huge pages @@ -120,6 +121,15 @@ unsigned long __begin_lockless_pgtbl_walk(bool disable_irq) if (disable_irq) local_irq_save(irq_mask); + /* + * Counts this instance of lockless pagetable walk for this cpu. + * Disables preempt to make sure there is no cpu change between + * begin/end lockless pagetable walk, so that percpu counting + * works fine. + */ + preempt_disable(); + (*this_cpu_ptr(&lockless_pgtbl_walk_counter))++; + /* * This memory barrier pairs with any code that is either trying to * delete page tables, or split huge pages. Without this barrier, @@ -158,6 +168,14 @@ inline void __end_lockless_pgtbl_walk(unsigned long irq_mask, bool enable_irq) */ smp_mb(); + /* + * Removes this instance of lockless pagetable walk for this cpu. + * Enables preempt only after end lockless pagetable walk, + * so that percpu counting works fine. + */ + (*this_cpu_ptr(&lockless_pgtbl_walk_counter))--; + preempt_enable(); + /* * Interrupts must be disabled during the lockless page table walk. * That's because the deleting or splitting involves flushing TLBs,
Implements an additional feature to track lockless pagetable walks, using a per-cpu counter: lockless_pgtbl_walk_counter. Before a lockless pagetable walk, preemption is disabled and the current cpu's counter is increased. When the lockless pagetable walk finishes, the current cpu counter is decreased and the preemption is enabled. With that, it's possible to know in which cpus are happening lockless pagetable walks, and optimize serialize_against_pte_lookup(). Implementation notes: - Every counter can be changed only by it's CPU - It makes use of the original memory barrier in the functions - Any counter can be read by any CPU Due to not locking nor using atomic variables, the impact on the lockless pagetable walk is intended to be minimum. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> --- arch/powerpc/mm/book3s64/pgtable.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)