Message ID | 20250417161216.88318-4-urezki@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/4] lib/test_vmalloc.c: Replace RWSEM to SRCU for setup | expand |
On 04/17/25 at 06:12pm, Uladzislau Rezki (Sony) wrote: > Currently both atomics share one cache-line: > > <snip> > ... > ffffffff83eab400 b vmap_lazy_nr > ffffffff83eab408 b nr_vmalloc_pages > ... > <snip> > > those are global variables and they are only 8 bytes apart. > Since they are modified by different threads this causes a > false sharing. This can lead to a performance drop due to > unnecessary cache invalidations. > > After this patch it is aligned to a cache line boundary: > > <snip> > ... > ffffffff8260a600 d vmap_lazy_nr > ffffffff8260a640 d nr_vmalloc_pages A great catch and improvement. Reviewed-by: Baoquan He <bhe@redhat.com> > ... > <snip> > > Cc: Mateusz Guzik <mjguzik@gmail.com> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > --- > mm/vmalloc.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 77da4613f07ff..54f60d62051da 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1008,7 +1008,8 @@ static BLOCKING_NOTIFIER_HEAD(vmap_notify_list); > static void drain_vmap_area_work(struct work_struct *work); > static DECLARE_WORK(drain_vmap_work, drain_vmap_area_work); > > -static atomic_long_t nr_vmalloc_pages; > +static __cacheline_aligned_in_smp atomic_long_t nr_vmalloc_pages; > +static __cacheline_aligned_in_smp atomic_long_t vmap_lazy_nr; > > unsigned long vmalloc_nr_pages(void) > { > @@ -2117,8 +2118,6 @@ static unsigned long lazy_max_pages(void) > return log * (32UL * 1024 * 1024 / PAGE_SIZE); > } > > -static atomic_long_t vmap_lazy_nr = ATOMIC_LONG_INIT(0); > - > /* > * Serialize vmap purging. There is no actual critical section protected > * by this lock, but we want to avoid concurrent calls for performance > -- > 2.39.5 >
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 77da4613f07ff..54f60d62051da 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1008,7 +1008,8 @@ static BLOCKING_NOTIFIER_HEAD(vmap_notify_list); static void drain_vmap_area_work(struct work_struct *work); static DECLARE_WORK(drain_vmap_work, drain_vmap_area_work); -static atomic_long_t nr_vmalloc_pages; +static __cacheline_aligned_in_smp atomic_long_t nr_vmalloc_pages; +static __cacheline_aligned_in_smp atomic_long_t vmap_lazy_nr; unsigned long vmalloc_nr_pages(void) { @@ -2117,8 +2118,6 @@ static unsigned long lazy_max_pages(void) return log * (32UL * 1024 * 1024 / PAGE_SIZE); } -static atomic_long_t vmap_lazy_nr = ATOMIC_LONG_INIT(0); - /* * Serialize vmap purging. There is no actual critical section protected * by this lock, but we want to avoid concurrent calls for performance
Currently both atomics share one cache-line: <snip> ... ffffffff83eab400 b vmap_lazy_nr ffffffff83eab408 b nr_vmalloc_pages ... <snip> those are global variables and they are only 8 bytes apart. Since they are modified by different threads this causes a false sharing. This can lead to a performance drop due to unnecessary cache invalidations. After this patch it is aligned to a cache line boundary: <snip> ... ffffffff8260a600 d vmap_lazy_nr ffffffff8260a640 d nr_vmalloc_pages ... <snip> Cc: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> --- mm/vmalloc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)