Message ID | 1597061872-58724-1-git-send-email-xlpang@linux.alibaba.com (mailing list archive) |
---|---|
Headers | show |
Series | mm/slub: Fix count_partial() problem | expand |
On Mon, Aug 10, 2020 at 3:18 PM Xunlei Pang <xlpang@linux.alibaba.com> wrote: > > v1->v2: > - Improved changelog and variable naming for PATCH 1~2. > - PATCH3 adds per-cpu counter to avoid performance regression > in concurrent __slab_free(). > > [Testing] > On my 32-cpu 2-socket physical machine: > Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz > perf stat --null --repeat 10 -- hackbench 20 thread 20000 > > == original, no patched > 19.211637055 seconds time elapsed ( +- 0.57% ) > > == patched with patch1~2 > Performance counter stats for 'hackbench 20 thread 20000' (10 runs): > > 21.731833146 seconds time elapsed ( +- 0.17% ) > > == patched with patch1~3 > Performance counter stats for 'hackbench 20 thread 20000' (10 runs): > > 19.112106847 seconds time elapsed ( +- 0.64% ) > > > Xunlei Pang (3): > mm/slub: Introduce two counters for partial objects > mm/slub: Get rid of count_partial() > mm/slub: Use percpu partial free counter > > mm/slab.h | 2 + > mm/slub.c | 124 +++++++++++++++++++++++++++++++++++++++++++------------------- > 2 files changed, 89 insertions(+), 37 deletions(-) We probably need to wrap the counters under CONFIG_SLUB_DEBUG because AFAICT all the code that uses them is also wrapped under it. An alternative approach for this patch would be to somehow make the lock in count_partial() more granular, but I don't know how feasible that actually is. Anyway, I am OK with this approach: Reviewed-by: Pekka Enberg <penberg@kernel.org> You still need to convince Christoph, though, because he had objections over this approach. - Pekka
On 2020/8/20 下午10:02, Pekka Enberg wrote: > On Mon, Aug 10, 2020 at 3:18 PM Xunlei Pang <xlpang@linux.alibaba.com> wrote: >> >> v1->v2: >> - Improved changelog and variable naming for PATCH 1~2. >> - PATCH3 adds per-cpu counter to avoid performance regression >> in concurrent __slab_free(). >> >> [Testing] >> On my 32-cpu 2-socket physical machine: >> Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz >> perf stat --null --repeat 10 -- hackbench 20 thread 20000 >> >> == original, no patched >> 19.211637055 seconds time elapsed ( +- 0.57% ) >> >> == patched with patch1~2 >> Performance counter stats for 'hackbench 20 thread 20000' (10 runs): >> >> 21.731833146 seconds time elapsed ( +- 0.17% ) >> >> == patched with patch1~3 >> Performance counter stats for 'hackbench 20 thread 20000' (10 runs): >> >> 19.112106847 seconds time elapsed ( +- 0.64% ) >> >> >> Xunlei Pang (3): >> mm/slub: Introduce two counters for partial objects >> mm/slub: Get rid of count_partial() >> mm/slub: Use percpu partial free counter >> >> mm/slab.h | 2 + >> mm/slub.c | 124 +++++++++++++++++++++++++++++++++++++++++++------------------- >> 2 files changed, 89 insertions(+), 37 deletions(-) > > We probably need to wrap the counters under CONFIG_SLUB_DEBUG because > AFAICT all the code that uses them is also wrapped under it. /sys/kernel/slab/***/partial sysfs also uses it, I can wrap it with CONFIG_SLUB_DEBUG or CONFIG_SYSFS for backward compatibility. > > An alternative approach for this patch would be to somehow make the > lock in count_partial() more granular, but I don't know how feasible > that actually is. > > Anyway, I am OK with this approach: > > Reviewed-by: Pekka Enberg <penberg@kernel.org> Thanks! > > You still need to convince Christoph, though, because he had > objections over this approach. Christoph, what do you think, or any better suggestion to address this *in production* issue? > > - Pekka >
Any progress on this? The problem addressed by this patch has also made jitters to our online apps which are quite annoying. On Mon, Aug 24, 2020 at 6:05 PM xunlei <xlpang@linux.alibaba.com> wrote: > > On 2020/8/20 下午10:02, Pekka Enberg wrote: > > On Mon, Aug 10, 2020 at 3:18 PM Xunlei Pang <xlpang@linux.alibaba.com> wrote: > >> > >> v1->v2: > >> - Improved changelog and variable naming for PATCH 1~2. > >> - PATCH3 adds per-cpu counter to avoid performance regression > >> in concurrent __slab_free(). > >> > >> [Testing] > >> On my 32-cpu 2-socket physical machine: > >> Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz > >> perf stat --null --repeat 10 -- hackbench 20 thread 20000 > >> > >> == original, no patched > >> 19.211637055 seconds time elapsed ( +- 0.57% ) > >> > >> == patched with patch1~2 > >> Performance counter stats for 'hackbench 20 thread 20000' (10 runs): > >> > >> 21.731833146 seconds time elapsed ( +- 0.17% ) > >> > >> == patched with patch1~3 > >> Performance counter stats for 'hackbench 20 thread 20000' (10 runs): > >> > >> 19.112106847 seconds time elapsed ( +- 0.64% ) > >> > >> > >> Xunlei Pang (3): > >> mm/slub: Introduce two counters for partial objects > >> mm/slub: Get rid of count_partial() > >> mm/slub: Use percpu partial free counter > >> > >> mm/slab.h | 2 + > >> mm/slub.c | 124 +++++++++++++++++++++++++++++++++++++++++++------------------- > >> 2 files changed, 89 insertions(+), 37 deletions(-) > > > > We probably need to wrap the counters under CONFIG_SLUB_DEBUG because > > AFAICT all the code that uses them is also wrapped under it. > > /sys/kernel/slab/***/partial sysfs also uses it, I can wrap it with > CONFIG_SLUB_DEBUG or CONFIG_SYSFS for backward compatibility. > > > > > An alternative approach for this patch would be to somehow make the > > lock in count_partial() more granular, but I don't know how feasible > > that actually is. > > > > Anyway, I am OK with this approach: > > > > Reviewed-by: Pekka Enberg <penberg@kernel.org> > > Thanks! > > > > > You still need to convince Christoph, though, because he had > > objections over this approach. > > Christoph, what do you think, or any better suggestion to address this > *in production* issue? > > > > > - Pekka > >
On 3/1/21 6:31 PM, Shu Ming wrote: > Any progress on this? The problem addressed by this patch has also > made jitters to our online apps which are quite annoying. > Thanks for the attention. There's some further improvements on v2, I'm gonna send v3 out later. > On Mon, Aug 24, 2020 at 6:05 PM xunlei <xlpang@linux.alibaba.com> wrote: >> >> On 2020/8/20 下午10:02, Pekka Enberg wrote: >>> On Mon, Aug 10, 2020 at 3:18 PM Xunlei Pang <xlpang@linux.alibaba.com> wrote: >>>> >>>> v1->v2: >>>> - Improved changelog and variable naming for PATCH 1~2. >>>> - PATCH3 adds per-cpu counter to avoid performance regression >>>> in concurrent __slab_free(). >>>> >>>> [Testing] >>>> On my 32-cpu 2-socket physical machine: >>>> Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz >>>> perf stat --null --repeat 10 -- hackbench 20 thread 20000 >>>> >>>> == original, no patched >>>> 19.211637055 seconds time elapsed ( +- 0.57% ) >>>> >>>> == patched with patch1~2 >>>> Performance counter stats for 'hackbench 20 thread 20000' (10 runs): >>>> >>>> 21.731833146 seconds time elapsed ( +- 0.17% ) >>>> >>>> == patched with patch1~3 >>>> Performance counter stats for 'hackbench 20 thread 20000' (10 runs): >>>> >>>> 19.112106847 seconds time elapsed ( +- 0.64% ) >>>> >>>> >>>> Xunlei Pang (3): >>>> mm/slub: Introduce two counters for partial objects >>>> mm/slub: Get rid of count_partial() >>>> mm/slub: Use percpu partial free counter >>>> >>>> mm/slab.h | 2 + >>>> mm/slub.c | 124 +++++++++++++++++++++++++++++++++++++++++++------------------- >>>> 2 files changed, 89 insertions(+), 37 deletions(-) >>> >>> We probably need to wrap the counters under CONFIG_SLUB_DEBUG because >>> AFAICT all the code that uses them is also wrapped under it. >> >> /sys/kernel/slab/***/partial sysfs also uses it, I can wrap it with >> CONFIG_SLUB_DEBUG or CONFIG_SYSFS for backward compatibility. >> >>> >>> An alternative approach for this patch would be to somehow make the >>> lock in count_partial() more granular, but I don't know how feasible >>> that actually is. >>> >>> Anyway, I am OK with this approach: >>> >>> Reviewed-by: Pekka Enberg <penberg@kernel.org> >> >> Thanks! >> >>> >>> You still need to convince Christoph, though, because he had >>> objections over this approach. >> >> Christoph, what do you think, or any better suggestion to address this >> *in production* issue? >> >>> >>> - Pekka >>>