Message ID | 20240201031950.3225626-1-maobibo@loongson.cn (mailing list archive) |
---|---|
Headers | show |
Series | LoongArch: Add pv ipi support on LoongArch VM | expand |
Hi, On 2/1/24 11:19, Bibo Mao wrote: > [snip] > > Here is the microbenchmarck data with perf bench futex wake case on 3C5000 > single-way machine, there are 16 cpus on 3C5000 single-way machine, VM > has 16 vcpus also. The benchmark data is ms time unit to wakeup 16 threads, > the performance is higher if data is smaller. > > perf bench futex wake, Wokeup 16 of 16 threads in ms > --physical machine-- --VM original-- --VM with pv ipi patch-- > 0.0176 ms 0.1140 ms 0.0481 ms > > --- > Change in V4: > 1. Modfiy pv ipi hook function name call_func_ipi() and > call_func_single_ipi() with send_ipi_mask()/send_ipi_single(), since pv > ipi is used for both remote function call and reschedule notification. > 2. Refresh changelog. > > Change in V3: > 1. Add 128 vcpu ipi multicast support like x86 > 2. Change cpucfg base address from 0x10000000 to 0x40000000, in order > to avoid confliction with future hw usage > 3. Adjust patch order in this patchset, move patch > Refine-ipi-ops-on-LoongArch-platform to the first one. Sorry for the late reply (and Happy Chinese New Year), and thanks for providing microbenchmark numbers! But it seems the more comprehensive CoreMark results were omitted (that's also absent in v3)? While the changes between v4 and v2 shouldn't be performance-sensitive IMO (I haven't checked carefully though), it could be better to showcase the improvements / non-harmfulness of the changes and make us confident in accepting the changes.
On 2/15/24 18:11, WANG Xuerui wrote: > Sorry for the late reply (and Happy Chinese New Year), and thanks for > providing microbenchmark numbers! But it seems the more comprehensive > CoreMark results were omitted (that's also absent in v3)? While the Of course the benchmark suite should be UnixBench instead of CoreMark. Lesson: don't multi-task code reviews, especially not after consuming beer -- a cup of coffee won't fully cancel the influence. ;-)
On 2024/2/15 下午6:25, WANG Xuerui wrote: > On 2/15/24 18:11, WANG Xuerui wrote: >> Sorry for the late reply (and Happy Chinese New Year), and thanks for >> providing microbenchmark numbers! But it seems the more comprehensive >> CoreMark results were omitted (that's also absent in v3)? While the > > Of course the benchmark suite should be UnixBench instead of CoreMark. > Lesson: don't multi-task code reviews, especially not after consuming > beer -- a cup of coffee won't fully cancel the influence. ;-) > Where is rule about benchmark choices like UnixBench/Coremark for ipi improvement? Regards Bibo Mao
On 2/17/24 11:15, maobibo wrote: > On 2024/2/15 下午6:25, WANG Xuerui wrote: >> On 2/15/24 18:11, WANG Xuerui wrote: >>> Sorry for the late reply (and Happy Chinese New Year), and thanks for >>> providing microbenchmark numbers! But it seems the more comprehensive >>> CoreMark results were omitted (that's also absent in v3)? While the >> >> Of course the benchmark suite should be UnixBench instead of CoreMark. >> Lesson: don't multi-task code reviews, especially not after consuming >> beer -- a cup of coffee won't fully cancel the influence. ;-) >> > Where is rule about benchmark choices like UnixBench/Coremark for ipi > improvement? Sorry for the late reply. The rules are mostly unwritten, but in general you can think of the preference of benchmark suites as a matter of "effectiveness" -- the closer it's to some real workload in the wild, the better. Micro-benchmarks is okay for illustrating the points, but without demonstrating the impact on realistic workloads, a change could be "useless" in practice or even decrease various performance metrics (be that throughput or latency or anything that matters in the certain case), but get accepted without notice.
On 2024/2/22 下午5:34, WANG Xuerui wrote: > On 2/17/24 11:15, maobibo wrote: >> On 2024/2/15 下午6:25, WANG Xuerui wrote: >>> On 2/15/24 18:11, WANG Xuerui wrote: >>>> Sorry for the late reply (and Happy Chinese New Year), and thanks >>>> for providing microbenchmark numbers! But it seems the more >>>> comprehensive CoreMark results were omitted (that's also absent in >>>> v3)? While the >>> >>> Of course the benchmark suite should be UnixBench instead of >>> CoreMark. Lesson: don't multi-task code reviews, especially not after >>> consuming beer -- a cup of coffee won't fully cancel the influence. ;-) >>> >> Where is rule about benchmark choices like UnixBench/Coremark for ipi >> improvement? > > Sorry for the late reply. The rules are mostly unwritten, but in general > you can think of the preference of benchmark suites as a matter of > "effectiveness" -- the closer it's to some real workload in the wild, > the better. Micro-benchmarks is okay for illustrating the points, but > without demonstrating the impact on realistic workloads, a change could > be "useless" in practice or even decrease various performance metrics > (be that throughput or latency or anything that matters in the certain > case), but get accepted without notice. yes, micro-benchmark cannot represent the real world, however it does not mean that UnixBench/Coremark should be run. You need to point out what is the negative effective from code, or what is the possible real scenario which may benefit. And points out the reasonable benchmark sensitive for IPIs rather than blindly saying UnixBench/Coremark. Regards Bibo Mao >
On 2/22/24 18:06, maobibo wrote: > > > On 2024/2/22 下午5:34, WANG Xuerui wrote: >> On 2/17/24 11:15, maobibo wrote: >>> On 2024/2/15 下午6:25, WANG Xuerui wrote: >>>> On 2/15/24 18:11, WANG Xuerui wrote: >>>>> Sorry for the late reply (and Happy Chinese New Year), and thanks >>>>> for providing microbenchmark numbers! But it seems the more >>>>> comprehensive CoreMark results were omitted (that's also absent in >>>>> v3)? While the >>>> >>>> Of course the benchmark suite should be UnixBench instead of >>>> CoreMark. Lesson: don't multi-task code reviews, especially not >>>> after consuming beer -- a cup of coffee won't fully cancel the >>>> influence. ;-) >>>> >>> Where is rule about benchmark choices like UnixBench/Coremark for ipi >>> improvement? >> >> Sorry for the late reply. The rules are mostly unwritten, but in >> general you can think of the preference of benchmark suites as a >> matter of "effectiveness" -- the closer it's to some real workload in >> the wild, the better. Micro-benchmarks is okay for illustrating the >> points, but without demonstrating the impact on realistic workloads, a >> change could be "useless" in practice or even decrease various >> performance metrics (be that throughput or latency or anything that >> matters in the certain case), but get accepted without notice. > yes, micro-benchmark cannot represent the real world, however it does > not mean that UnixBench/Coremark should be run. You need to point out > what is the negative effective from code, or what is the possible real > scenario which may benefit. And points out the reasonable benchmark > sensitive for IPIs rather than blindly saying UnixBench/Coremark. I was not meaning to argue with you, nor was I implying that your changes "must be regressing things even though I didn't check myself" -- my point is, *any* comparison with realistic workload that shows the performance mostly unaffected inside/outside KVM, would give reviewers (and yourself too) much more confidence in accepting the change. For me, personally I think a microbenchmark could be enough, because the only externally-visible change is the IPI mechanism overhead, but please consider other reviewers that may or may not be familiar enough with LoongArch to be able to notice the "triviality". Also, given the 6-patch size of the series, it could hardly be considered "trivial".