Message ID | 20240815014629.2685155-1-liaochang1@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | uprobes: Improve scalability by reducing the contention on siglock | expand |
Hi, Oleg Kindly ping. This series have been pending for a month. Is thre any issue I overlook? Thanks. 在 2024/8/15 9:46, Liao Chang 写道: > The profiling result of BPF selftest on ARM64 platform reveals the > significant contention on the current->sighand->siglock is the > scalability bottleneck. The reason is also very straightforward that all > producer threads of benchmark have to contend the spinlock mentioned to > resume the TIF_SIGPENDING bit in thread_info that might be removed in > uprobe_deny_signal(). > > The contention on current->sighand->siglock is unnecessary, this series > remove them thoroughly. I've use the script developed by Andrii in [1] > to run benchmark. The CPU used was Kunpeng916 (Hi1616), 4 NUMA nodes, > 64 cores@2.4GHz running the kernel on next tree + the optimization in > [2] for get_xol_insn_slot(). > > before-opt > ---------- > uprobe-nop ( 1 cpus): 0.907 ± 0.003M/s ( 0.907M/s/cpu) > uprobe-nop ( 2 cpus): 1.676 ± 0.008M/s ( 0.838M/s/cpu) > uprobe-nop ( 4 cpus): 3.210 ± 0.003M/s ( 0.802M/s/cpu) > uprobe-nop ( 8 cpus): 4.457 ± 0.003M/s ( 0.557M/s/cpu) > uprobe-nop (16 cpus): 3.724 ± 0.011M/s ( 0.233M/s/cpu) > uprobe-nop (32 cpus): 2.761 ± 0.003M/s ( 0.086M/s/cpu) > uprobe-nop (64 cpus): 1.293 ± 0.015M/s ( 0.020M/s/cpu) > > uprobe-push ( 1 cpus): 0.883 ± 0.001M/s ( 0.883M/s/cpu) > uprobe-push ( 2 cpus): 1.642 ± 0.005M/s ( 0.821M/s/cpu) > uprobe-push ( 4 cpus): 3.086 ± 0.002M/s ( 0.771M/s/cpu) > uprobe-push ( 8 cpus): 3.390 ± 0.003M/s ( 0.424M/s/cpu) > uprobe-push (16 cpus): 2.652 ± 0.005M/s ( 0.166M/s/cpu) > uprobe-push (32 cpus): 2.713 ± 0.005M/s ( 0.085M/s/cpu) > uprobe-push (64 cpus): 1.313 ± 0.009M/s ( 0.021M/s/cpu) > > uprobe-ret ( 1 cpus): 1.774 ± 0.000M/s ( 1.774M/s/cpu) > uprobe-ret ( 2 cpus): 3.350 ± 0.001M/s ( 1.675M/s/cpu) > uprobe-ret ( 4 cpus): 6.604 ± 0.000M/s ( 1.651M/s/cpu) > uprobe-ret ( 8 cpus): 6.706 ± 0.005M/s ( 0.838M/s/cpu) > uprobe-ret (16 cpus): 5.231 ± 0.001M/s ( 0.327M/s/cpu) > uprobe-ret (32 cpus): 5.743 ± 0.003M/s ( 0.179M/s/cpu) > uprobe-ret (64 cpus): 4.726 ± 0.016M/s ( 0.074M/s/cpu) > > after-opt > --------- > uprobe-nop ( 1 cpus): 0.985 ± 0.002M/s ( 0.985M/s/cpu) > uprobe-nop ( 2 cpus): 1.773 ± 0.005M/s ( 0.887M/s/cpu) > uprobe-nop ( 4 cpus): 3.304 ± 0.001M/s ( 0.826M/s/cpu) > uprobe-nop ( 8 cpus): 5.328 ± 0.002M/s ( 0.666M/s/cpu) > uprobe-nop (16 cpus): 6.475 ± 0.002M/s ( 0.405M/s/cpu) > uprobe-nop (32 cpus): 4.831 ± 0.082M/s ( 0.151M/s/cpu) > uprobe-nop (64 cpus): 2.564 ± 0.053M/s ( 0.040M/s/cpu) > > uprobe-push ( 1 cpus): 0.964 ± 0.001M/s ( 0.964M/s/cpu) > uprobe-push ( 2 cpus): 1.766 ± 0.002M/s ( 0.883M/s/cpu) > uprobe-push ( 4 cpus): 3.290 ± 0.009M/s ( 0.823M/s/cpu) > uprobe-push ( 8 cpus): 4.670 ± 0.002M/s ( 0.584M/s/cpu) > uprobe-push (16 cpus): 5.197 ± 0.004M/s ( 0.325M/s/cpu) > uprobe-push (32 cpus): 5.068 ± 0.161M/s ( 0.158M/s/cpu) > uprobe-push (64 cpus): 2.605 ± 0.026M/s ( 0.041M/s/cpu) > > uprobe-ret ( 1 cpus): 1.833 ± 0.001M/s ( 1.833M/s/cpu) > uprobe-ret ( 2 cpus): 3.384 ± 0.003M/s ( 1.692M/s/cpu) > uprobe-ret ( 4 cpus): 6.677 ± 0.004M/s ( 1.669M/s/cpu) > uprobe-ret ( 8 cpus): 6.854 ± 0.005M/s ( 0.857M/s/cpu) > uprobe-ret (16 cpus): 6.508 ± 0.006M/s ( 0.407M/s/cpu) > uprobe-ret (32 cpus): 5.793 ± 0.009M/s ( 0.181M/s/cpu) > uprobe-ret (64 cpus): 4.743 ± 0.016M/s ( 0.074M/s/cpu) > > Above benchmark results demonstrates a obivious improvement in the > scalability of trig-uprobe-nop and trig-uprobe-push, the peak throughput > of which are from 4.5M/s to 6.4M/s and 3.3M/s to 5.1M/s individually. > > v3->v2: > Renaming the flag in [2/2], s/deny_signal/signal_denied/g. > > v2->v1: > Oleg pointed out the _DENY_SIGNAL will be replaced by _ACK upon the > completion of singlestep which leads to handle_singlestep() has no > chance to restore the removed TIF_SIGPENDING [3] and some case in > question. So this revision proposes to use a flag in uprobe_task to > track the denied TIF_SIGPENDING instead of new UPROBE_SSTEP state. > > [1] https://lore.kernel.org/all/20240731214256.3588718-1-andrii@kernel.org > [2] https://lore.kernel.org/all/20240727094405.1362496-1-liaochang1@huawei.com > [3] https://lore.kernel.org/all/20240801082407.1618451-1-liaochang1@huawei.com > > Liao Chang (2): > uprobes: Remove redundant spinlock in uprobe_deny_signal() > uprobes: Remove the spinlock within handle_singlestep() > > include/linux/uprobes.h | 1 + > kernel/events/uprobes.c | 10 +++++----- > 2 files changed, 6 insertions(+), 5 deletions(-) >
Hi Liao, On 09/14, Liao, Chang wrote: > > Hi, Oleg > > Kindly ping. > > This series have been pending for a month. Is thre any issue I overlook? Well, I have already acked both patches. Please resend them to Peter/Masami, with my acks included. Oleg.
Hi, Peter and Masami I look forward to your inputs on these series. Andrii has proven they are hepful for uprobe scalability. Thanks. 在 2024/9/15 23:18, Oleg Nesterov 写道: > Hi Liao, > > On 09/14, Liao, Chang wrote: >> >> Hi, Oleg >> >> Kindly ping. >> >> This series have been pending for a month. Is thre any issue I overlook? > > Well, I have already acked both patches. > > Please resend them to Peter/Masami, with my acks included. > > Oleg. > >
On Tue, Sep 17, 2024 at 7:05 PM Liao, Chang <liaochang1@huawei.com> wrote: > > Hi, Peter and Masami > > I look forward to your inputs on these series. Andrii has proven they are > hepful for uprobe scalability. > > Thanks. > > 在 2024/9/15 23:18, Oleg Nesterov 写道: > > Hi Liao, > > > > On 09/14, Liao, Chang wrote: > >> > >> Hi, Oleg > >> > >> Kindly ping. > >> > >> This series have been pending for a month. Is thre any issue I overlook? > > > > Well, I have already acked both patches. > > > > Please resend them to Peter/Masami, with my acks included. > > Hey Liao, I didn't see v4 from you for this patch set with Oleg's acks. Did you get a chance to rebase, add acks, and send the latest version? > > Oleg. > > > > > > -- > BR > Liao, Chang
在 2024/10/12 3:34, Andrii Nakryiko 写道: > On Tue, Sep 17, 2024 at 7:05 PM Liao, Chang <liaochang1@huawei.com> wrote: >> >> Hi, Peter and Masami >> >> I look forward to your inputs on these series. Andrii has proven they are >> hepful for uprobe scalability. >> >> Thanks. >> >> 在 2024/9/15 23:18, Oleg Nesterov 写道: >>> Hi Liao, >>> >>> On 09/14, Liao, Chang wrote: >>>> >>>> Hi, Oleg >>>> >>>> Kindly ping. >>>> >>>> This series have been pending for a month. Is thre any issue I overlook? >>> >>> Well, I have already acked both patches. >>> >>> Please resend them to Peter/Masami, with my acks included. >>> > > Hey Liao, > > I didn't see v4 from you for this patch set with Oleg's acks. Did you > get a chance to rebase, add acks, and send the latest version? Andrii, I am ready to send v4 based on the latest kernel from next tree. Otherwise, I haven't heard back from any of maintainers except Oleg, so I'm a bit unsure if I should make further changes to this series. > >>> Oleg. >>> >>> >> >> -- >> BR >> Liao, Chang
On Mon, Oct 21, 2024 at 3:43 AM Liao, Chang <liaochang1@huawei.com> wrote: > > > > 在 2024/10/12 3:34, Andrii Nakryiko 写道: > > On Tue, Sep 17, 2024 at 7:05 PM Liao, Chang <liaochang1@huawei.com> wrote: > >> > >> Hi, Peter and Masami > >> > >> I look forward to your inputs on these series. Andrii has proven they are > >> hepful for uprobe scalability. > >> > >> Thanks. > >> > >> 在 2024/9/15 23:18, Oleg Nesterov 写道: > >>> Hi Liao, > >>> > >>> On 09/14, Liao, Chang wrote: > >>>> > >>>> Hi, Oleg > >>>> > >>>> Kindly ping. > >>>> > >>>> This series have been pending for a month. Is thre any issue I overlook? > >>> > >>> Well, I have already acked both patches. > >>> > >>> Please resend them to Peter/Masami, with my acks included. > >>> > > > > Hey Liao, > > > > I didn't see v4 from you for this patch set with Oleg's acks. Did you > > get a chance to rebase, add acks, and send the latest version? > > Andrii, > > I am ready to send v4 based on the latest kernel from next tree. Otherwise, > I haven't heard back from any of maintainers except Oleg, so I'm a bit unsure > if I should make further changes to this series. > Let's just rebase to the latest tip/perf/core and resend with Oleg's ack. Hopefully this should be enough. > > > >>> Oleg. > >>> > >>> > >> > >> -- > >> BR > >> Liao, Chang > > -- > BR > Liao, Chang >
在 2024/10/22 1:18, Andrii Nakryiko 写道: > On Mon, Oct 21, 2024 at 3:43 AM Liao, Chang <liaochang1@huawei.com> wrote: >> >> >> >> 在 2024/10/12 3:34, Andrii Nakryiko 写道: >>> On Tue, Sep 17, 2024 at 7:05 PM Liao, Chang <liaochang1@huawei.com> wrote: >>>> >>>> Hi, Peter and Masami >>>> >>>> I look forward to your inputs on these series. Andrii has proven they are >>>> hepful for uprobe scalability. >>>> >>>> Thanks. >>>> >>>> 在 2024/9/15 23:18, Oleg Nesterov 写道: >>>>> Hi Liao, >>>>> >>>>> On 09/14, Liao, Chang wrote: >>>>>> >>>>>> Hi, Oleg >>>>>> >>>>>> Kindly ping. >>>>>> >>>>>> This series have been pending for a month. Is thre any issue I overlook? >>>>> >>>>> Well, I have already acked both patches. >>>>> >>>>> Please resend them to Peter/Masami, with my acks included. >>>>> >>> >>> Hey Liao, >>> >>> I didn't see v4 from you for this patch set with Oleg's acks. Did you >>> get a chance to rebase, add acks, and send the latest version? >> >> Andrii, >> >> I am ready to send v4 based on the latest kernel from next tree. Otherwise, >> I haven't heard back from any of maintainers except Oleg, so I'm a bit unsure >> if I should make further changes to this series. >> > > Let's just rebase to the latest tip/perf/core and resend with Oleg's > ack. Hopefully this should be enough. OK, the v4 is on the way with Masami's Acked-by. > >>> >>>>> Oleg. >>>>> >>>>> >>>> >>>> -- >>>> BR >>>> Liao, Chang >> >> -- >> BR >> Liao, Chang >>