diff mbox series

[-V3,3/9] mm, pcp: reduce lock contention for draining high-order pages

Message ID 20231016053002.756205-4-ying.huang@intel.com (mailing list archive)
State New
Headers show
Series mm: PCP high auto-tuning | expand

Commit Message

Huang, Ying Oct. 16, 2023, 5:29 a.m. UTC
In commit f26b3fa04611 ("mm/page_alloc: limit number of high-order
pages on PCP during bulk free"), the PCP (Per-CPU Pageset) will be
drained when PCP is mostly used for high-order pages freeing to
improve the cache-hot pages reusing between page allocating and
freeing CPUs.

On system with small per-CPU data cache slice, pages shouldn't be
cached before draining to guarantee cache-hot.  But on a system with
large per-CPU data cache slice, some pages can be cached before
draining to reduce zone lock contention.

So, in this patch, instead of draining without any caching,
"pcp->batch" pages will be cached in PCP before draining if the
size of the per-CPU data cache slice is more than "3 * batch".

In theory, if the size of per-CPU data cache slice is more than "2 *
batch", we can reuse cache-hot pages between CPUs.  But considering
the other usage of cache (code, other data accessing, etc.), "3 *
batch" is used.

Note: "3 * batch" is chosen to make sure the optimization works on
recent x86_64 server CPUs.  If you want to increase it, please check
whether it breaks the optimization.

On a 2-socket Intel server with 128 logical CPU, with the patch, the
network bandwidth of the UNIX (AF_UNIX) test case of lmbench test
suite with 16-pair processes increase 70.5%.  The cycles% of the
spinlock contention (mostly for zone lock) decreases from 46.1% to
21.3%.  The number of PCP draining for high order pages
freeing (free_high) decreases 89.9%.  The cache miss rate keeps 0.2%.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
---
 drivers/base/cacheinfo.c |  2 ++
 include/linux/gfp.h      |  1 +
 include/linux/mmzone.h   |  6 ++++++
 mm/page_alloc.c          | 38 +++++++++++++++++++++++++++++++++++++-
 4 files changed, 46 insertions(+), 1 deletion(-)

Comments

kernel test robot Oct. 27, 2023, 6:23 a.m. UTC | #1
Hello,

kernel test robot noticed a 14.6% improvement of netperf.Throughput_Mbps on:


commit: f5ddc662f07d7d99e9cfc5e07778e26c7394caf8 ("[PATCH -V3 3/9] mm, pcp: reduce lock contention for draining high-order pages")
url: https://github.com/intel-lab-lkp/linux/commits/Huang-Ying/mm-pcp-avoid-to-drain-PCP-when-process-exit/20231017-143633
base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/driver-core.git 36b2d7dd5a8ac95c8c1e69bdc93c4a6e2dc28a23
patch link: https://lore.kernel.org/all/20231016053002.756205-4-ying.huang@intel.com/
patch subject: [PATCH -V3 3/9] mm, pcp: reduce lock contention for draining high-order pages

testcase: netperf
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
parameters:

	ip: ipv4
	runtime: 300s
	nr_threads: 200%
	cluster: cs-localhost
	send_size: 10K
	test: SCTP_STREAM_MANY
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231027/202310271441.71ce0a9-oliver.sang@intel.com

=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/send_size/tbox_group/test/testcase:
  cs-localhost/gcc-12/performance/ipv4/x86_64-rhel-8.3/200%/debian-11.1-x86_64-20220510.cgz/300s/10K/lkp-icl-2sp2/SCTP_STREAM_MANY/netperf

commit: 
  c828e65251 ("cacheinfo: calculate size of per-CPU data cache slice")
  f5ddc662f0 ("mm, pcp: reduce lock contention for draining high-order pages")

c828e65251502516 f5ddc662f07d7d99e9cfc5e0777 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     26471           -11.1%      23520        uptime.idle
 2.098e+10           -14.1%  1.802e+10        cpuidle..time
 5.798e+08           +14.3%  6.628e+08        cpuidle..usage
 1.329e+09           +14.7%  1.525e+09        numa-numastat.node0.local_node
 1.329e+09           +14.7%  1.525e+09        numa-numastat.node0.numa_hit
 1.336e+09           +14.6%  1.531e+09        numa-numastat.node1.local_node
 1.336e+09           +14.6%  1.531e+09        numa-numastat.node1.numa_hit
 1.329e+09           +14.7%  1.525e+09        numa-vmstat.node0.numa_hit
 1.329e+09           +14.7%  1.525e+09        numa-vmstat.node0.numa_local
 1.336e+09           +14.6%  1.531e+09        numa-vmstat.node1.numa_hit
 1.336e+09           +14.6%  1.531e+09        numa-vmstat.node1.numa_local
     26.31 ± 12%     +33.0%      35.00 ± 10%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.sctp_datamsg_from_user.sctp_sendmsg_to_asoc
    229.00 ± 13%     -24.7%     172.33 ±  5%  perf-sched.wait_and_delay.count.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.sctp_datamsg_from_user.sctp_sendmsg_to_asoc
    929.50 ±  2%      +8.2%       1005 ±  4%  perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
     26.30 ± 12%     +33.0%      35.00 ± 10%  perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.sctp_datamsg_from_user.sctp_sendmsg_to_asoc
     53.98           -14.1%      46.36        vmstat.cpu.id
     58.15           +17.6%      68.37        vmstat.procs.r
   3720385           +15.6%    4301904        vmstat.system.cs
   1991764           +14.5%    2281507        vmstat.system.in
     53.69            -7.7       46.03        mpstat.cpu.all.idle%
      2.10            +0.3        2.44        mpstat.cpu.all.irq%
      7.25            +1.3        8.58        mpstat.cpu.all.soft%
     35.74            +5.7       41.46        mpstat.cpu.all.sys%
      1.23            +0.3        1.49        mpstat.cpu.all.usr%
   2047040            +2.9%    2105598        proc-vmstat.nr_file_pages
   1377160            +4.2%    1435588        proc-vmstat.nr_shmem
 2.665e+09           +14.7%  3.056e+09        proc-vmstat.numa_hit
 2.665e+09           +14.7%  3.056e+09        proc-vmstat.numa_local
 1.534e+10           +14.6%  1.758e+10        proc-vmstat.pgalloc_normal
 1.534e+10           +14.6%  1.758e+10        proc-vmstat.pgfree
      1296           +16.3%       1507        turbostat.Avg_MHz
     49.98            +8.1       58.12        turbostat.Busy%
 5.797e+08           +14.3%  6.628e+08        turbostat.C1
     53.88            -7.6       46.34        turbostat.C1%
     50.02           -16.3%      41.88        turbostat.CPU%c1
 6.081e+08           +14.5%  6.961e+08        turbostat.IRQ
    391.82            +3.5%     405.41        turbostat.PkgWatt
      2204           +14.6%       2527        netperf.ThroughputBoth_Mbps
    564378           +14.6%     647027        netperf.ThroughputBoth_total_Mbps
      2204           +14.6%       2527        netperf.Throughput_Mbps
    564378           +14.6%     647027        netperf.Throughput_total_Mbps
    146051            +5.9%     154705        netperf.time.involuntary_context_switches
      3011           +16.8%       3516        netperf.time.percent_of_cpu_this_job_got
      8875           +16.6%      10351        netperf.time.system_time
    221.39           +18.0%     261.14        netperf.time.user_time
   2759631            +8.0%    2981144        netperf.time.voluntary_context_switches
 2.067e+09           +14.6%  2.369e+09        netperf.workload
   2920531           +34.4%    3925407        sched_debug.cfs_rq:/.avg_vruntime.avg
   3172407 ±  2%     +36.5%    4331807 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.max
   2801767           +35.2%    3787891 ±  2%  sched_debug.cfs_rq:/.avg_vruntime.min
     45404 ±  5%     +33.3%      60516 ± 11%  sched_debug.cfs_rq:/.avg_vruntime.stddev
   2817265 ± 10%     +40.6%    3961862        sched_debug.cfs_rq:/.left_vruntime.max
    376003 ± 18%     +51.2%     568331 ± 13%  sched_debug.cfs_rq:/.left_vruntime.stddev
   2920531           +34.4%    3925407        sched_debug.cfs_rq:/.min_vruntime.avg
   3172407 ±  2%     +36.5%    4331807 ±  3%  sched_debug.cfs_rq:/.min_vruntime.max
   2801767           +35.2%    3787891 ±  2%  sched_debug.cfs_rq:/.min_vruntime.min
     45404 ±  5%     +33.3%      60516 ± 11%  sched_debug.cfs_rq:/.min_vruntime.stddev
   2817265 ± 10%     +40.6%    3961862        sched_debug.cfs_rq:/.right_vruntime.max
    376003 ± 18%     +51.2%     568331 ± 13%  sched_debug.cfs_rq:/.right_vruntime.stddev
    157.25 ±  6%     +13.3%     178.14 ±  4%  sched_debug.cfs_rq:/.util_est_enqueued.avg
   4361500           +15.5%    5035528        sched_debug.cpu.nr_switches.avg
   4674667           +14.7%    5363125        sched_debug.cpu.nr_switches.max
   3947619           +14.1%    4504637 ±  2%  sched_debug.cpu.nr_switches.min
      0.56            -3.7%       0.54        perf-stat.i.MPKI
 2.293e+10           +14.3%  2.622e+10        perf-stat.i.branch-instructions
 1.449e+08           +15.6%  1.675e+08        perf-stat.i.branch-misses
      2.15            -0.1        2.05        perf-stat.i.cache-miss-rate%
  67409238           +10.2%   74274510        perf-stat.i.cache-misses
 3.199e+09           +15.7%  3.702e+09        perf-stat.i.cache-references
   3765045           +15.6%    4353228        perf-stat.i.context-switches
      1.42            +1.7%       1.45        perf-stat.i.cpi
 1.717e+11           +16.5%      2e+11        perf-stat.i.cpu-cycles
      5094           +51.1%       7695 ±  3%  perf-stat.i.cpu-migrations
      2554            +5.7%       2699        perf-stat.i.cycles-between-cache-misses
  3.28e+10           +14.5%  3.756e+10        perf-stat.i.dTLB-loads
    329792 ± 11%     +37.3%     452936 ± 15%  perf-stat.i.dTLB-store-misses
  2.04e+10           +14.7%  2.339e+10        perf-stat.i.dTLB-stores
 1.205e+11           +14.4%  1.379e+11        perf-stat.i.instructions
      0.71            -1.7%       0.69        perf-stat.i.ipc
      1.34           +16.5%       1.56        perf-stat.i.metric.GHz
    221.29            +7.4%     237.74        perf-stat.i.metric.K/sec
    619.67           +14.5%     709.77        perf-stat.i.metric.M/sec
   7031738           +14.3%    8034255        perf-stat.i.node-load-misses
     79.94            -1.3       78.62        perf-stat.i.node-store-miss-rate%
   3349862 ±  2%      +9.2%    3656880        perf-stat.i.node-stores
      0.56            -3.7%       0.54        perf-stat.overall.MPKI
      2.11            -0.1        2.01        perf-stat.overall.cache-miss-rate%
      1.42            +1.8%       1.45        perf-stat.overall.cpi
      2546            +5.7%       2692        perf-stat.overall.cycles-between-cache-misses
      0.70            -1.8%       0.69        perf-stat.overall.ipc
     79.91            -1.4       78.54        perf-stat.overall.node-store-miss-rate%
 2.286e+10           +14.3%  2.614e+10        perf-stat.ps.branch-instructions
 1.444e+08           +15.6%  1.669e+08        perf-stat.ps.branch-misses
  67192773           +10.2%   74037940        perf-stat.ps.cache-misses
 3.189e+09           +15.7%   3.69e+09        perf-stat.ps.cache-references
   3753095           +15.6%    4339552        perf-stat.ps.context-switches
 1.711e+11           +16.5%  1.994e+11        perf-stat.ps.cpu-cycles
      5078           +51.1%       7674 ±  3%  perf-stat.ps.cpu-migrations
 3.269e+10           +14.5%  3.743e+10        perf-stat.ps.dTLB-loads
    328489 ± 11%     +37.3%     451131 ± 15%  perf-stat.ps.dTLB-store-misses
 2.033e+10           +14.7%  2.331e+10        perf-stat.ps.dTLB-stores
 1.201e+11           +14.4%  1.374e+11        perf-stat.ps.instructions
   7009249           +14.3%    8009170        perf-stat.ps.node-load-misses
   3339511 ±  2%      +9.2%    3645997        perf-stat.ps.node-stores
 3.635e+13           +14.3%  4.155e+13        perf-stat.total.instructions
      4.40 ±  2%      -1.5        2.87        perf-profile.calltrace.cycles-pp.skb_release_data.kfree_skb_reason.sctp_recvmsg.inet_recvmsg.sock_recvmsg
      5.83            -1.4        4.41        perf-profile.calltrace.cycles-pp.kfree_skb_reason.sctp_recvmsg.inet_recvmsg.sock_recvmsg.____sys_recvmsg
      1.92 ±  3%      -1.4        0.55        perf-profile.calltrace.cycles-pp.free_unref_page.skb_release_data.kfree_skb_reason.sctp_recvmsg.inet_recvmsg
     22.33            -1.3       21.03        perf-profile.calltrace.cycles-pp.sctp_recvmsg.inet_recvmsg.sock_recvmsg.____sys_recvmsg.___sys_recvmsg
     22.42            -1.3       21.12        perf-profile.calltrace.cycles-pp.inet_recvmsg.sock_recvmsg.____sys_recvmsg.___sys_recvmsg.__sys_recvmsg
     22.75            -1.3       21.48        perf-profile.calltrace.cycles-pp.sock_recvmsg.____sys_recvmsg.___sys_recvmsg.__sys_recvmsg.do_syscall_64
     23.44            -1.2       22.20        perf-profile.calltrace.cycles-pp.____sys_recvmsg.___sys_recvmsg.__sys_recvmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe
     24.65            -1.2       23.47        perf-profile.calltrace.cycles-pp.___sys_recvmsg.__sys_recvmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvmsg
     25.14            -1.2       23.98        perf-profile.calltrace.cycles-pp.__sys_recvmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvmsg
     25.46            -1.1       24.31        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvmsg
     25.59            -1.1       24.46        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.recvmsg
     26.47            -1.1       25.36        perf-profile.calltrace.cycles-pp.recvmsg
      3.57 ±  6%      -0.6        2.93 ±  9%  perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.__kmalloc_large_node.__kmalloc_node_track_caller
      5.22 ±  2%      -0.4        4.79        perf-profile.calltrace.cycles-pp.__alloc_pages.__kmalloc_large_node.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb
      4.76 ±  2%      -0.4        4.33        perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.__kmalloc_large_node.__kmalloc_node_track_caller.kmalloc_reserve
      0.96            -0.4        0.59 ±  2%  perf-profile.calltrace.cycles-pp.release_sock.sctp_recvmsg.inet_recvmsg.sock_recvmsg.____sys_recvmsg
      3.16 ±  2%      -0.3        2.84        perf-profile.calltrace.cycles-pp.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb.sctp_packet_transmit.sctp_outq_flush
      3.14 ±  2%      -0.3        2.82        perf-profile.calltrace.cycles-pp.__kmalloc_large_node.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb.sctp_packet_transmit
      3.18 ±  2%      -0.3        2.86        perf-profile.calltrace.cycles-pp.kmalloc_reserve.__alloc_skb.sctp_packet_transmit.sctp_outq_flush.sctp_cmd_interpreter
      3.44 ±  2%      -0.3        3.13        perf-profile.calltrace.cycles-pp.__alloc_skb.sctp_packet_transmit.sctp_outq_flush.sctp_cmd_interpreter.sctp_do_sm
      1.62 ±  3%      -0.3        1.34 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.rmqueue.get_page_from_freelist.__alloc_pages.__kmalloc_large_node
      1.49 ±  3%      -0.3        1.22 ±  3%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.rmqueue.get_page_from_freelist.__alloc_pages
      1.46 ±  2%      -0.2        1.25 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__free_pages_ok.skb_release_data.kfree_skb_reason
      1.62 ±  2%      -0.2        1.43 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__free_pages_ok.skb_release_data.kfree_skb_reason.sctp_recvmsg
      1.99 ±  2%      -0.2        1.80        perf-profile.calltrace.cycles-pp.__free_pages_ok.skb_release_data.kfree_skb_reason.sctp_recvmsg.inet_recvmsg
      0.76            -0.2        0.58        perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
      0.85            -0.1        0.74        perf-profile.calltrace.cycles-pp.__slab_free.sctp_recvmsg.inet_recvmsg.sock_recvmsg.____sys_recvmsg
      0.84            -0.1        0.73        perf-profile.calltrace.cycles-pp.free_unref_page_commit.free_unref_page.skb_release_data.consume_skb.sctp_chunk_put
      1.37            -0.1        1.28        perf-profile.calltrace.cycles-pp.free_unref_page.skb_release_data.consume_skb.sctp_chunk_put.sctp_outq_sack
      2.65            -0.1        2.57        perf-profile.calltrace.cycles-pp.kmalloc_reserve.__alloc_skb._sctp_make_chunk.sctp_make_datafrag_empty.sctp_datamsg_from_user
      2.56            -0.1        2.48        perf-profile.calltrace.cycles-pp.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb._sctp_make_chunk.sctp_make_datafrag_empty
      2.49 ±  2%      -0.1        2.42        perf-profile.calltrace.cycles-pp.__kmalloc_large_node.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb._sctp_make_chunk
      1.92            -0.1        1.85        perf-profile.calltrace.cycles-pp.skb_release_data.consume_skb.sctp_chunk_put.sctp_outq_sack.sctp_cmd_interpreter
      0.62            +0.0        0.64        perf-profile.calltrace.cycles-pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.sctp_recvmsg.inet_recvmsg
      0.65            +0.0        0.68        perf-profile.calltrace.cycles-pp.sctp_chunk_put.sctp_ulpevent_free.sctp_recvmsg.inet_recvmsg.sock_recvmsg
      0.89            +0.0        0.93        perf-profile.calltrace.cycles-pp.copy_msghdr_from_user.___sys_recvmsg.__sys_recvmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.24            +0.0        1.28        perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sctp_data_ready
      0.56 ±  2%      +0.0        0.60        perf-profile.calltrace.cycles-pp.sctp_packet_config.sctp_outq_select_transport.sctp_outq_flush_data.sctp_outq_flush.sctp_cmd_interpreter
      1.32            +0.0        1.36        perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sctp_data_ready.sctp_ulpq_tail_event
      1.29            +0.0        1.33        perf-profile.calltrace.cycles-pp.sctp_ulpevent_free.sctp_recvmsg.inet_recvmsg.sock_recvmsg.____sys_recvmsg
      0.71 ±  2%      +0.0        0.75        perf-profile.calltrace.cycles-pp.sctp_outq_select_transport.sctp_outq_flush_data.sctp_outq_flush.sctp_cmd_interpreter.sctp_do_sm
      0.61            +0.0        0.66        perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.schedule_timeout
      0.62            +0.1        0.67        perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate.sched_ttwu_pending
      1.50            +0.1        1.56        perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.sctp_data_ready.sctp_ulpq_tail_event.sctp_ulpq_tail_data
      1.58            +0.1        1.64        perf-profile.calltrace.cycles-pp.__wake_up_common_lock.sctp_data_ready.sctp_ulpq_tail_event.sctp_ulpq_tail_data.sctp_cmd_interpreter
      0.70            +0.1        0.76        perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.schedule_timeout.sctp_skb_recv_datagram
      1.02            +0.1        1.08        perf-profile.calltrace.cycles-pp.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single
      2.02            +0.1        2.09        perf-profile.calltrace.cycles-pp.sctp_outq_flush_data.sctp_outq_flush.sctp_cmd_interpreter.sctp_do_sm.sctp_primitive_SEND
      1.86            +0.1        1.93        perf-profile.calltrace.cycles-pp.sctp_data_ready.sctp_ulpq_tail_event.sctp_ulpq_tail_data.sctp_cmd_interpreter.sctp_do_sm
      0.76            +0.1        0.83        perf-profile.calltrace.cycles-pp.activate_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single
      0.73            +0.1        0.80        perf-profile.calltrace.cycles-pp.enqueue_task_fair.activate_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue
      0.89            +0.1        0.96        perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
      0.82            +0.1        0.89        perf-profile.calltrace.cycles-pp.__sk_mem_reduce_allocated.sctp_wfree.skb_release_head_state.consume_skb.sctp_chunk_put
      0.95            +0.1        1.03        perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      2.06            +0.1        2.14        perf-profile.calltrace.cycles-pp.sctp_ulpq_tail_event.sctp_ulpq_tail_data.sctp_cmd_interpreter.sctp_do_sm.sctp_assoc_bh_rcv
      3.68            +0.1        3.76        perf-profile.calltrace.cycles-pp.copyin._copy_from_iter.sctp_user_addto_chunk.sctp_datamsg_from_user.sctp_sendmsg_to_asoc
      0.98            +0.1        1.06        perf-profile.calltrace.cycles-pp.__sk_mem_reduce_allocated.skb_release_head_state.kfree_skb_reason.sctp_recvmsg.inet_recvmsg
      1.34            +0.1        1.43        perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single
      1.38            +0.1        1.47        perf-profile.calltrace.cycles-pp.sctp_wfree.skb_release_head_state.consume_skb.sctp_chunk_put.sctp_outq_sack
      1.54            +0.1        1.63        perf-profile.calltrace.cycles-pp.skb_release_head_state.consume_skb.sctp_chunk_put.sctp_outq_sack.sctp_cmd_interpreter
      1.25 ±  2%      +0.1        1.35        perf-profile.calltrace.cycles-pp.__sk_mem_raise_allocated.__sk_mem_schedule.sctp_sendmsg_to_asoc.sctp_sendmsg.sock_sendmsg
      1.28 ±  2%      +0.1        1.38        perf-profile.calltrace.cycles-pp.__sk_mem_schedule.sctp_sendmsg_to_asoc.sctp_sendmsg.sock_sendmsg.____sys_sendmsg
      1.82            +0.1        1.93        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_halt
      2.00            +0.1        2.11        perf-profile.calltrace.cycles-pp.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter
      1.39            +0.1        1.50        perf-profile.calltrace.cycles-pp.skb_release_head_state.kfree_skb_reason.sctp_recvmsg.inet_recvmsg.sock_recvmsg
      4.39            +0.1        4.51        perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      2.68            +0.2        2.84        perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state
      2.98            +0.2        3.14        perf-profile.calltrace.cycles-pp.sctp_ulpevent_make_rcvmsg.sctp_ulpq_tail_data.sctp_cmd_interpreter.sctp_do_sm.sctp_assoc_bh_rcv
      1.88            +0.2        2.06        perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg
      0.34 ± 70%      +0.2        0.54        perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.schedule_timeout.sctp_skb_recv_datagram
     10.32            +0.2       10.53        perf-profile.calltrace.cycles-pp.sctp_do_sm.sctp_primitive_SEND.sctp_sendmsg_to_asoc.sctp_sendmsg.sock_sendmsg
      3.60            +0.2        3.81        perf-profile.calltrace.cycles-pp.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg.sock_recvmsg.____sys_recvmsg
      1.94            +0.2        2.14        perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg
      2.20            +0.2        2.41        perf-profile.calltrace.cycles-pp.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg.sock_recvmsg
     10.93            +0.2       11.16        perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
     10.51            +0.2       10.74        perf-profile.calltrace.cycles-pp.sctp_cmd_interpreter.sctp_do_sm.sctp_primitive_SEND.sctp_sendmsg_to_asoc.sctp_sendmsg
      7.26            +0.2        7.50        perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.sctp_recvmsg
     11.17            +0.2       11.42        perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      5.40            +0.2        5.64        perf-profile.calltrace.cycles-pp.sctp_ulpq_tail_data.sctp_cmd_interpreter.sctp_do_sm.sctp_assoc_bh_rcv.sctp_rcv
     11.25            +0.2       11.50        perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      7.38            +0.3        7.64        perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.sctp_recvmsg.inet_recvmsg
     20.03            +0.3       20.29        perf-profile.calltrace.cycles-pp.sctp_backlog_rcv.__release_sock.release_sock.sctp_sendmsg.sock_sendmsg
     20.09            +0.3       20.36        perf-profile.calltrace.cycles-pp.__release_sock.release_sock.sctp_sendmsg.sock_sendmsg.____sys_sendmsg
     20.30            +0.3       20.57        perf-profile.calltrace.cycles-pp.release_sock.sctp_sendmsg.sock_sendmsg.____sys_sendmsg.___sys_sendmsg
      8.40            +0.3        8.68        perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.sctp_recvmsg.inet_recvmsg.sock_recvmsg
      8.44            +0.3        8.72        perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.sctp_recvmsg.inet_recvmsg.sock_recvmsg.____sys_recvmsg
     11.85            +0.3       12.14        perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     11.22            +0.3       11.52        perf-profile.calltrace.cycles-pp.sctp_packet_transmit.sctp_outq_flush.sctp_cmd_interpreter.sctp_do_sm.sctp_primitive_SEND
     13.26            +0.3       13.61        perf-profile.calltrace.cycles-pp.sctp_outq_flush.sctp_cmd_interpreter.sctp_do_sm.sctp_primitive_SEND.sctp_sendmsg_to_asoc
     13.21            +0.4       13.59        perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     13.25            +0.4       13.64        perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
     13.24            +0.4       13.62        perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     13.34            +0.4       13.74        perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
     15.70            +0.4       16.12        perf-profile.calltrace.cycles-pp.sctp_primitive_SEND.sctp_sendmsg_to_asoc.sctp_sendmsg.sock_sendmsg.____sys_sendmsg
      0.55            +0.5        1.02 ± 19%  perf-profile.calltrace.cycles-pp.__sk_mem_raise_allocated.__sk_mem_schedule.sctp_ulpevent_make_rcvmsg.sctp_ulpq_tail_data.sctp_cmd_interpreter
      0.66 ± 28%      +0.5        1.14        perf-profile.calltrace.cycles-pp.__sk_mem_schedule.sctp_ulpevent_make_rcvmsg.sctp_ulpq_tail_data.sctp_cmd_interpreter.sctp_do_sm
      0.00            +0.5        0.54        perf-profile.calltrace.cycles-pp.sctp_sf_eat_data_6_2.sctp_do_sm.sctp_assoc_bh_rcv.sctp_rcv.ip_protocol_deliver_rcu
     51.26            +0.5       51.80        perf-profile.calltrace.cycles-pp.sctp_sendmsg.sock_sendmsg.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg
     15.28            +0.5       15.82        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
     51.76            +0.6       52.32        perf-profile.calltrace.cycles-pp.sock_sendmsg.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg.do_syscall_64
     53.77            +0.6       54.34        perf-profile.calltrace.cycles-pp.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.06 ±  2%      -2.4        3.68        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      5.94 ±  2%      -2.2        3.75        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      6.84            -1.9        4.97        perf-profile.children.cycles-pp.skb_release_data
      3.64            -1.7        1.92        perf-profile.children.cycles-pp.free_unref_page
      2.04 ±  2%      -1.7        0.34 ±  2%  perf-profile.children.cycles-pp.free_pcppages_bulk
      5.84            -1.4        4.42        perf-profile.children.cycles-pp.kfree_skb_reason
     22.43            -1.3       21.14        perf-profile.children.cycles-pp.inet_recvmsg
     22.67            -1.3       21.39        perf-profile.children.cycles-pp.sctp_recvmsg
     22.76            -1.3       21.50        perf-profile.children.cycles-pp.sock_recvmsg
     23.46            -1.2       22.22        perf-profile.children.cycles-pp.____sys_recvmsg
     24.68            -1.2       23.50        perf-profile.children.cycles-pp.___sys_recvmsg
     25.16            -1.2       24.00        perf-profile.children.cycles-pp.__sys_recvmsg
     26.69            -1.1       25.59        perf-profile.children.cycles-pp.recvmsg
     82.77            -0.5       82.24        perf-profile.children.cycles-pp.do_syscall_64
     83.14            -0.5       82.63        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      5.02            -0.5        4.53        perf-profile.children.cycles-pp.get_page_from_freelist
      5.46            -0.5        4.98        perf-profile.children.cycles-pp.__alloc_pages
      5.96            -0.5        5.50        perf-profile.children.cycles-pp.__kmalloc_node_track_caller
      6.21            -0.5        5.76        perf-profile.children.cycles-pp.kmalloc_reserve
      3.86            -0.5        3.41        perf-profile.children.cycles-pp.rmqueue
      5.88            -0.5        5.44        perf-profile.children.cycles-pp.__kmalloc_large_node
      7.47            -0.4        7.07        perf-profile.children.cycles-pp.__alloc_skb
      0.65 ±  3%      -0.3        0.30 ±  5%  perf-profile.children.cycles-pp.sctp_wait_for_sndbuf
      1.91            -0.3        1.58        perf-profile.children.cycles-pp._raw_spin_lock_bh
      1.78            -0.3        1.46        perf-profile.children.cycles-pp.lock_sock_nested
      4.43            -0.2        4.22        perf-profile.children.cycles-pp.consume_skb
      6.00            -0.2        5.80        perf-profile.children.cycles-pp.sctp_outq_sack
      5.82            -0.2        5.62        perf-profile.children.cycles-pp.sctp_chunk_put
      2.00 ±  2%      -0.2        1.82 ±  2%  perf-profile.children.cycles-pp.__free_pages_ok
      1.20            -0.2        1.04        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      1.27            -0.1        1.16        perf-profile.children.cycles-pp.__slab_free
      0.39            -0.1        0.32 ±  2%  perf-profile.children.cycles-pp.__free_one_page
      0.86 ±  2%      -0.1        0.79        perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.42            -0.1        0.36 ±  2%  perf-profile.children.cycles-pp.__zone_watermark_ok
      0.45 ±  2%      -0.1        0.40 ±  2%  perf-profile.children.cycles-pp.rmqueue_bulk
      0.54            -0.0        0.51        perf-profile.children.cycles-pp.__list_add_valid_or_report
      0.65 ±  2%      -0.0        0.62        perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.47 ±  2%      -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.__kmalloc
      0.25 ±  3%      -0.0        0.22 ±  2%  perf-profile.children.cycles-pp.__irq_exit_rcu
      0.24 ±  4%      -0.0        0.22 ±  3%  perf-profile.children.cycles-pp.perf_event_task_tick
      0.24 ±  3%      -0.0        0.22 ±  3%  perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
      0.15 ±  5%      -0.0        0.13 ±  4%  perf-profile.children.cycles-pp.__intel_pmu_enable_all
      0.11 ±  4%      -0.0        0.09 ±  5%  perf-profile.children.cycles-pp.sctp_assoc_rwnd_increase
      0.06            +0.0        0.07        perf-profile.children.cycles-pp.ct_idle_exit
      0.12 ±  3%      +0.0        0.13 ±  2%  perf-profile.children.cycles-pp.get_pfnblock_flags_mask
      0.42            +0.0        0.44        perf-profile.children.cycles-pp.free_unref_page_prepare
      0.14 ±  2%      +0.0        0.16 ±  3%  perf-profile.children.cycles-pp.check_stack_object
      0.13 ±  2%      +0.0        0.15 ±  3%  perf-profile.children.cycles-pp.__mod_lruvec_page_state
      0.27            +0.0        0.28        perf-profile.children.cycles-pp.update_curr
      0.22            +0.0        0.24 ±  2%  perf-profile.children.cycles-pp.__switch_to_asm
      0.16 ±  2%      +0.0        0.18 ±  3%  perf-profile.children.cycles-pp.__update_load_avg_se
      0.29 ±  2%      +0.0        0.30        perf-profile.children.cycles-pp.sctp_outq_flush_ctrl
      0.42            +0.0        0.44        perf-profile.children.cycles-pp.free_large_kmalloc
      0.13 ±  2%      +0.0        0.15 ±  7%  perf-profile.children.cycles-pp.update_cfs_group
      0.40            +0.0        0.42 ±  2%  perf-profile.children.cycles-pp.loopback_xmit
      0.24 ±  3%      +0.0        0.26 ±  2%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
      0.45            +0.0        0.47        perf-profile.children.cycles-pp.dev_hard_start_xmit
      0.20            +0.0        0.23 ±  3%  perf-profile.children.cycles-pp.set_next_entity
      0.63            +0.0        0.65        perf-profile.children.cycles-pp.simple_copy_to_iter
      0.13 ±  3%      +0.0        0.15 ±  3%  perf-profile.children.cycles-pp.sk_leave_memory_pressure
      0.30            +0.0        0.32 ±  2%  perf-profile.children.cycles-pp.sctp_inet_skb_msgname
      0.54 ±  2%      +0.0        0.57 ±  2%  perf-profile.children.cycles-pp.__copy_skb_header
      0.31            +0.0        0.34        perf-profile.children.cycles-pp.___perf_sw_event
      0.27 ±  3%      +0.0        0.30 ±  2%  perf-profile.children.cycles-pp.security_socket_recvmsg
      0.24 ±  3%      +0.0        0.26        perf-profile.children.cycles-pp.ipv4_dst_check
      0.42 ±  2%      +0.0        0.44 ±  3%  perf-profile.children.cycles-pp.page_counter_try_charge
      1.30            +0.0        1.33        perf-profile.children.cycles-pp.try_to_wake_up
      0.42            +0.0        0.45        perf-profile.children.cycles-pp.__mod_node_page_state
      0.79            +0.0        0.82        perf-profile.children.cycles-pp.__skb_clone
      0.44            +0.0        0.48        perf-profile.children.cycles-pp.aa_sk_perm
      0.30            +0.0        0.33 ±  4%  perf-profile.children.cycles-pp.accept_connection
      0.30            +0.0        0.33 ±  4%  perf-profile.children.cycles-pp.spawn_child
      0.30            +0.0        0.33 ±  4%  perf-profile.children.cycles-pp.process_requests
      0.36            +0.0        0.40        perf-profile.children.cycles-pp.prepare_task_switch
      0.28 ±  2%      +0.0        0.31 ±  5%  perf-profile.children.cycles-pp.recv_sctp_stream_1toMany
      0.66            +0.0        0.70        perf-profile.children.cycles-pp.sctp_addrs_lookup_transport
      0.69            +0.0        0.72        perf-profile.children.cycles-pp.__sctp_rcv_lookup
      0.39 ±  3%      +0.0        0.43        perf-profile.children.cycles-pp.dst_release
      1.36            +0.0        1.40        perf-profile.children.cycles-pp.autoremove_wake_function
      0.77            +0.0        0.81        perf-profile.children.cycles-pp.kmem_cache_alloc_node
      1.31            +0.0        1.35        perf-profile.children.cycles-pp.sctp_ulpevent_free
      0.92            +0.0        0.96        perf-profile.children.cycles-pp.try_charge_memcg
      0.64            +0.0        0.69        perf-profile.children.cycles-pp.dequeue_entity
      0.83            +0.0        0.88        perf-profile.children.cycles-pp.sctp_packet_config
      2.48            +0.0        2.53        perf-profile.children.cycles-pp.copy_msghdr_from_user
      0.61 ±  3%      +0.1        0.66 ±  2%  perf-profile.children.cycles-pp.mem_cgroup_uncharge_skmem
      0.66            +0.1        0.71        perf-profile.children.cycles-pp.enqueue_entity
      1.56            +0.1        1.61        perf-profile.children.cycles-pp.__wake_up_common
      1.39            +0.1        1.45        perf-profile.children.cycles-pp.kmem_cache_free
      1.02 ±  2%      +0.1        1.08        perf-profile.children.cycles-pp.sctp_outq_select_transport
      0.00            +0.1        0.06 ±  9%  perf-profile.children.cycles-pp.pick_next_task_idle
      1.64            +0.1        1.70        perf-profile.children.cycles-pp.__wake_up_common_lock
      0.86 ±  3%      +0.1        0.92        perf-profile.children.cycles-pp.pick_next_task_fair
      0.58            +0.1        0.64        perf-profile.children.cycles-pp.update_load_avg
      1.56            +0.1        1.62        perf-profile.children.cycles-pp.__check_object_size
      0.71            +0.1        0.77        perf-profile.children.cycles-pp.dequeue_task_fair
      0.86            +0.1        0.93        perf-profile.children.cycles-pp.sctp_eat_data
      1.92            +0.1        1.99        perf-profile.children.cycles-pp.sctp_data_ready
      1.05            +0.1        1.12        perf-profile.children.cycles-pp.ttwu_do_activate
      0.26 ± 32%      +0.1        0.33 ±  4%  perf-profile.children.cycles-pp.accept_connections
      2.16            +0.1        2.22        perf-profile.children.cycles-pp.sctp_ulpq_tail_event
      0.76            +0.1        0.83        perf-profile.children.cycles-pp.enqueue_task_fair
      0.78            +0.1        0.86        perf-profile.children.cycles-pp.activate_task
      0.98            +0.1        1.05        perf-profile.children.cycles-pp.sctp_sf_eat_data_6_2
      0.97            +0.1        1.04        perf-profile.children.cycles-pp.schedule_idle
      3.22            +0.1        3.30        perf-profile.children.cycles-pp.sctp_outq_flush_data
      1.78            +0.1        1.85        perf-profile.children.cycles-pp.mem_cgroup_charge_skmem
      1.48            +0.1        1.56        perf-profile.children.cycles-pp.sctp_wfree
      1.38            +0.1        1.46        perf-profile.children.cycles-pp.sched_ttwu_pending
      3.80            +0.1        3.89        perf-profile.children.cycles-pp.copyin
      3.92            +0.1        4.00        perf-profile.children.cycles-pp._copy_from_iter
     10.14            +0.1       10.24        perf-profile.children.cycles-pp.sctp_datamsg_from_user
      1.87            +0.1        1.97        perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      4.48            +0.1        4.59        perf-profile.children.cycles-pp.sctp_user_addto_chunk
      2.04            +0.1        2.15        perf-profile.children.cycles-pp.__sysvec_call_function_single
      6.96            +0.1        7.09        perf-profile.children.cycles-pp.__memcpy
      7.57            +0.1        7.71        perf-profile.children.cycles-pp.sctp_packet_pack
      3.20            +0.1        3.34        perf-profile.children.cycles-pp.sctp_ulpevent_make_rcvmsg
      1.85            +0.2        2.00        perf-profile.children.cycles-pp.__sk_mem_reduce_allocated
     12.41            +0.2       12.56        perf-profile.children.cycles-pp.sctp_rcv
      2.74            +0.2        2.90        perf-profile.children.cycles-pp.sysvec_call_function_single
      2.41            +0.2        2.57        perf-profile.children.cycles-pp.__sk_mem_raise_allocated
      2.48            +0.2        2.65        perf-profile.children.cycles-pp.__sk_mem_schedule
     13.86            +0.2       14.04        perf-profile.children.cycles-pp.__do_softirq
     13.28            +0.2       13.45        perf-profile.children.cycles-pp.process_backlog
     13.31            +0.2       13.49        perf-profile.children.cycles-pp.__napi_poll
     13.45            +0.2       13.63        perf-profile.children.cycles-pp.net_rx_action
      2.04            +0.2        2.21        perf-profile.children.cycles-pp.schedule
      2.28            +0.2        2.46        perf-profile.children.cycles-pp.schedule_timeout
     12.53            +0.2       12.71        perf-profile.children.cycles-pp.ip_local_deliver_finish
     13.05            +0.2       13.23        perf-profile.children.cycles-pp.__netif_receive_skb_one_core
     12.51            +0.2       12.69        perf-profile.children.cycles-pp.ip_protocol_deliver_rcu
     29.73            +0.2       29.92        perf-profile.children.cycles-pp.sctp_outq_flush
      3.63            +0.2        3.84        perf-profile.children.cycles-pp.sctp_skb_recv_datagram
     13.78            +0.2       13.98        perf-profile.children.cycles-pp.do_softirq
      5.68            +0.2        5.89        perf-profile.children.cycles-pp.sctp_ulpq_tail_data
     13.98            +0.2       14.20        perf-profile.children.cycles-pp.__local_bh_enable_ip
      3.22            +0.2        3.44        perf-profile.children.cycles-pp.skb_release_head_state
      2.90            +0.2        3.13        perf-profile.children.cycles-pp.__schedule
     36.67            +0.2       36.90        perf-profile.children.cycles-pp.sctp_do_sm
     36.13            +0.2       36.36        perf-profile.children.cycles-pp.sctp_cmd_interpreter
     10.99            +0.2       11.22        perf-profile.children.cycles-pp.acpi_safe_halt
      7.30            +0.2        7.54        perf-profile.children.cycles-pp.copyout
     14.37            +0.2       14.61        perf-profile.children.cycles-pp.__dev_queue_xmit
     11.01            +0.2       11.26        perf-profile.children.cycles-pp.acpi_idle_enter
     14.53            +0.2       14.78        perf-profile.children.cycles-pp.ip_finish_output2
      7.40            +0.3        7.65        perf-profile.children.cycles-pp._copy_to_iter
     15.04            +0.3       15.29        perf-profile.children.cycles-pp.__ip_queue_xmit
     11.26            +0.3       11.52        perf-profile.children.cycles-pp.cpuidle_enter_state
     11.33            +0.3       11.59        perf-profile.children.cycles-pp.cpuidle_enter
     29.10            +0.3       29.37        perf-profile.children.cycles-pp.sctp_sendmsg_to_asoc
      8.41            +0.3        8.69        perf-profile.children.cycles-pp.__skb_datagram_iter
      8.45            +0.3        8.73        perf-profile.children.cycles-pp.skb_copy_datagram_iter
     11.94            +0.3       12.25        perf-profile.children.cycles-pp.cpuidle_idle_call
      9.15            +0.4        9.52        perf-profile.children.cycles-pp.asm_sysvec_call_function_single
     13.25            +0.4       13.64        perf-profile.children.cycles-pp.start_secondary
     13.32            +0.4       13.71        perf-profile.children.cycles-pp.do_idle
     13.34            +0.4       13.74        perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     13.34            +0.4       13.74        perf-profile.children.cycles-pp.cpu_startup_entry
     16.00            +0.4       16.41        perf-profile.children.cycles-pp.sctp_primitive_SEND
     52.23            +0.6       52.80        perf-profile.children.cycles-pp.sock_sendmsg
     52.14            +0.6       52.72        perf-profile.children.cycles-pp.sctp_sendmsg
     54.28            +0.6       54.87        perf-profile.children.cycles-pp.____sys_sendmsg
     56.24            +0.6       56.85        perf-profile.children.cycles-pp.___sys_sendmsg
     56.83            +0.6       57.45        perf-profile.children.cycles-pp.__sys_sendmsg
      6.05 ±  2%      -2.4        3.68        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.97            -0.2        0.81        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      1.26            -0.1        1.14        perf-profile.self.cycles-pp.__slab_free
      1.22            -0.1        1.14        perf-profile.self.cycles-pp.rmqueue
      0.40            -0.1        0.35 ±  2%  perf-profile.self.cycles-pp.__zone_watermark_ok
      0.46            -0.0        0.42        perf-profile.self.cycles-pp.__list_add_valid_or_report
      0.18 ±  4%      -0.0        0.16 ±  4%  perf-profile.self.cycles-pp.__free_one_page
      0.15 ±  5%      -0.0        0.13 ±  4%  perf-profile.self.cycles-pp.__intel_pmu_enable_all
      0.10 ±  3%      +0.0        0.11        perf-profile.self.cycles-pp._copy_to_iter
      0.31            +0.0        0.32        perf-profile.self.cycles-pp.sctp_v4_xmit
      0.24            +0.0        0.26 ±  2%  perf-profile.self.cycles-pp.__sys_sendmsg
      0.06 ±  7%      +0.0        0.08        perf-profile.self.cycles-pp.dequeue_task_fair
      0.07 ±  5%      +0.0        0.09 ±  5%  perf-profile.self.cycles-pp.newidle_balance
      0.40            +0.0        0.42        perf-profile.self.cycles-pp.sctp_skb_recv_datagram
      0.19 ±  3%      +0.0        0.20 ±  2%  perf-profile.self.cycles-pp.menu_select
      0.11 ±  4%      +0.0        0.13 ±  4%  perf-profile.self.cycles-pp.enqueue_task_fair
      0.37            +0.0        0.39        perf-profile.self.cycles-pp.__check_object_size
      0.78            +0.0        0.80        perf-profile.self.cycles-pp._raw_spin_lock_bh
      0.22 ±  2%      +0.0        0.24 ±  2%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
      0.21            +0.0        0.23        perf-profile.self.cycles-pp.__switch_to_asm
      0.27            +0.0        0.30        perf-profile.self.cycles-pp.___perf_sw_event
      0.38            +0.0        0.40 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.12 ±  3%      +0.0        0.15 ±  6%  perf-profile.self.cycles-pp.update_cfs_group
      0.35            +0.0        0.38        perf-profile.self.cycles-pp.____sys_recvmsg
      0.05            +0.0        0.08 ±  6%  perf-profile.self.cycles-pp.schedule
      0.20            +0.0        0.22 ±  2%  perf-profile.self.cycles-pp.update_load_avg
      0.28 ±  2%      +0.0        0.31        perf-profile.self.cycles-pp.sctp_inet_skb_msgname
      0.23 ±  3%      +0.0        0.25        perf-profile.self.cycles-pp.ipv4_dst_check
      0.12 ±  4%      +0.0        0.14 ±  3%  perf-profile.self.cycles-pp.sk_leave_memory_pressure
      0.58            +0.0        0.61 ±  2%  perf-profile.self.cycles-pp.kmem_cache_alloc_node
      0.41            +0.0        0.44 ±  2%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.36            +0.0        0.39 ±  2%  perf-profile.self.cycles-pp.aa_sk_perm
      0.27 ±  3%      +0.0        0.30 ±  5%  perf-profile.self.cycles-pp.recv_sctp_stream_1toMany
      0.78            +0.0        0.82        perf-profile.self.cycles-pp.sctp_recvmsg
      0.71            +0.0        0.74        perf-profile.self.cycles-pp.sctp_sendmsg
      0.38 ±  3%      +0.0        0.42 ±  2%  perf-profile.self.cycles-pp.dst_release
      1.36            +0.1        1.42        perf-profile.self.cycles-pp.kmem_cache_free
      0.51 ±  2%      +0.1        0.58 ±  2%  perf-profile.self.cycles-pp.__sk_mem_raise_allocated
      0.63            +0.1        0.70 ±  2%  perf-profile.self.cycles-pp.sctp_eat_data
      0.47 ±  4%      +0.1        0.55 ±  3%  perf-profile.self.cycles-pp.__sk_mem_reduce_allocated
      3.77            +0.1        3.86        perf-profile.self.cycles-pp.copyin
      6.90            +0.1        7.03        perf-profile.self.cycles-pp.__memcpy
      7.30            +0.2        7.45        perf-profile.self.cycles-pp.acpi_safe_halt
      7.26            +0.2        7.49        perf-profile.self.cycles-pp.copyout




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
kernel test robot Nov. 6, 2023, 6:22 a.m. UTC | #2
hi, Huang Ying,

sorry for late of this report.
we reported
"a 14.6% improvement of netperf.Throughput_Mbps"
in
https://lore.kernel.org/all/202310271441.71ce0a9-oliver.sang@intel.com/

later, our auto-bisect tool captured a regression on a netperf test with
different configurations, however, unfortunately, regarded it as 'reported'
so we missed this report at the first time.

now send again FYI.


Hello,

kernel test robot noticed a -60.4% regression of netperf.Throughput_Mbps on:


commit: f5ddc662f07d7d99e9cfc5e07778e26c7394caf8 ("[PATCH -V3 3/9] mm, pcp: reduce lock contention for draining high-order pages")
url: https://github.com/intel-lab-lkp/linux/commits/Huang-Ying/mm-pcp-avoid-to-drain-PCP-when-process-exit/20231017-143633
base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/driver-core.git 36b2d7dd5a8ac95c8c1e69bdc93c4a6e2dc28a23
patch link: https://lore.kernel.org/all/20231016053002.756205-4-ying.huang@intel.com/
patch subject: [PATCH -V3 3/9] mm, pcp: reduce lock contention for draining high-order pages

testcase: netperf
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
parameters:

	ip: ipv4
	runtime: 300s
	nr_threads: 50%
	cluster: cs-localhost
	test: UDP_STREAM
	cpufreq_governor: performance



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202311061311.8d63998-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231106/202311061311.8d63998-oliver.sang@intel.com

=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
  cs-localhost/gcc-12/performance/ipv4/x86_64-rhel-8.3/50%/debian-11.1-x86_64-20220510.cgz/300s/lkp-icl-2sp2/UDP_STREAM/netperf

commit: 
  c828e65251 ("cacheinfo: calculate size of per-CPU data cache slice")
  f5ddc662f0 ("mm, pcp: reduce lock contention for draining high-order pages")

c828e65251502516 f5ddc662f07d7d99e9cfc5e0777 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      7321 ±  4%     +28.2%       9382        uptime.idle
     50.65 ±  4%      -4.0%      48.64        boot-time.boot
      6042 ±  4%      -4.2%       5785        boot-time.idle
 1.089e+09 ±  2%    +232.1%  3.618e+09        cpuidle..time
   1087075 ±  2%  +24095.8%   2.63e+08        cpuidle..usage
   3357014           +99.9%    6710312        vmstat.memory.cache
     48731 ± 19%   +4666.5%    2322787        vmstat.system.cs
    144637          +711.2%    1173334        vmstat.system.in
      2.59 ±  2%      +6.2        8.79        mpstat.cpu.all.idle%
      1.01            +0.7        1.66        mpstat.cpu.all.irq%
      6.00            -3.2        2.79        mpstat.cpu.all.soft%
      1.13 ±  2%      -0.1        1.02        mpstat.cpu.all.usr%
 1.407e+09 ±  3%     -28.2%  1.011e+09        numa-numastat.node0.local_node
 1.407e+09 ±  3%     -28.2%   1.01e+09        numa-numastat.node0.numa_hit
 1.469e+09 ±  8%     -32.0%  9.979e+08        numa-numastat.node1.local_node
 1.469e+09 ±  8%     -32.1%  9.974e+08        numa-numastat.node1.numa_hit
    103.00 ± 19%     -44.0%      57.67 ± 20%  perf-c2c.DRAM.local
      8970 ± 12%     -89.4%     951.00 ±  4%  perf-c2c.DRAM.remote
      8192 ±  5%     +68.5%      13807        perf-c2c.HITM.local
      6675 ± 11%     -92.6%     491.00 ±  2%  perf-c2c.HITM.remote
   1051014 ±  2%  +24922.0%   2.63e+08        turbostat.C1
      2.75 ±  2%      +6.5        9.29        turbostat.C1%
      2.72 ±  2%    +178.3%       7.57        turbostat.CPU%c1
      0.09           -22.2%       0.07        turbostat.IPC
  44589125          +701.5%  3.574e+08        turbostat.IRQ
    313.00 ± 57%   +1967.0%       6469 ±  8%  turbostat.POLL
     70.33            +3.3%      72.67        turbostat.PkgTmp
     44.23 ±  4%     -31.8%      30.15 ±  2%  turbostat.RAMWatt
    536096          +583.7%    3665194        meminfo.Active
    535414          +584.4%    3664543        meminfo.Active(anon)
   3238301          +103.2%    6579677        meminfo.Cached
   1204424          +278.9%    4563575        meminfo.Committed_AS
    469093           +47.9%     693889 ±  3%  meminfo.Inactive
    467250           +48.4%     693496 ±  3%  meminfo.Inactive(anon)
     53615          +562.5%     355225 ±  4%  meminfo.Mapped
   5223078           +64.1%    8571212        meminfo.Memused
    557305          +599.6%    3899111        meminfo.Shmem
   5660207           +58.9%    8993642        meminfo.max_used_kB
     78504 ±  3%     -30.1%      54869        netperf.ThroughputBoth_Mbps
   5024292 ±  3%     -30.1%    3511666        netperf.ThroughputBoth_total_Mbps
      7673 ±  5%    +249.7%      26832        netperf.ThroughputRecv_Mbps
    491074 ±  5%    +249.7%    1717287        netperf.ThroughputRecv_total_Mbps
     70831 ±  2%     -60.4%      28037        netperf.Throughput_Mbps
   4533217 ±  2%     -60.4%    1794379        netperf.Throughput_total_Mbps
      5439            +9.4%       5949        netperf.time.percent_of_cpu_this_job_got
     16206            +9.4%      17728        netperf.time.system_time
    388.14           -51.9%     186.53        netperf.time.user_time
 2.876e+09 ±  3%     -30.1%   2.01e+09        netperf.workload
    177360 ± 30%     -36.0%     113450 ± 20%  numa-meminfo.node0.AnonPages
    255926 ± 12%     -40.6%     152052 ± 12%  numa-meminfo.node0.AnonPages.max
     22582 ± 61%    +484.2%     131916 ± 90%  numa-meminfo.node0.Mapped
    138287 ± 17%     +22.6%     169534 ± 12%  numa-meminfo.node1.AnonHugePages
    267468 ± 20%     +29.1%     345385 ±  6%  numa-meminfo.node1.AnonPages
    346204 ± 18%     +34.5%     465696 ±  2%  numa-meminfo.node1.AnonPages.max
    279416 ± 19%     +77.0%     494652 ± 18%  numa-meminfo.node1.Inactive
    278445 ± 19%     +77.6%     494393 ± 18%  numa-meminfo.node1.Inactive(anon)
     31726 ± 45%    +607.7%     224533 ± 45%  numa-meminfo.node1.Mapped
      4802 ±  6%     +19.4%       5733 ±  3%  numa-meminfo.node1.PageTables
    297323 ± 12%    +792.6%    2653850 ± 63%  numa-meminfo.node1.Shmem
     44325 ± 30%     -36.0%      28379 ± 20%  numa-vmstat.node0.nr_anon_pages
      5590 ± 61%    +491.0%      33042 ± 90%  numa-vmstat.node0.nr_mapped
 1.407e+09 ±  3%     -28.2%   1.01e+09        numa-vmstat.node0.numa_hit
 1.407e+09 ±  3%     -28.2%  1.011e+09        numa-vmstat.node0.numa_local
     66858 ± 20%     +29.2%      86385 ±  6%  numa-vmstat.node1.nr_anon_pages
     69601 ± 20%     +77.8%     123729 ± 18%  numa-vmstat.node1.nr_inactive_anon
      7953 ± 45%    +608.3%      56335 ± 45%  numa-vmstat.node1.nr_mapped
      1201 ±  6%     +19.4%       1434 ±  3%  numa-vmstat.node1.nr_page_table_pages
     74288 ± 11%    +792.6%     663111 ± 63%  numa-vmstat.node1.nr_shmem
     69601 ± 20%     +77.8%     123728 ± 18%  numa-vmstat.node1.nr_zone_inactive_anon
 1.469e+09 ±  8%     -32.1%  9.974e+08        numa-vmstat.node1.numa_hit
 1.469e+09 ±  8%     -32.0%  9.979e+08        numa-vmstat.node1.numa_local
    133919          +584.2%     916254        proc-vmstat.nr_active_anon
    111196            +3.3%     114828        proc-vmstat.nr_anon_pages
   5602484            -1.5%    5518799        proc-vmstat.nr_dirty_background_threshold
  11218668            -1.5%   11051092        proc-vmstat.nr_dirty_threshold
    809646          +103.2%    1645012        proc-vmstat.nr_file_pages
  56374629            -1.5%   55536913        proc-vmstat.nr_free_pages
    116775           +48.4%     173349 ±  3%  proc-vmstat.nr_inactive_anon
     13386 ±  2%    +563.3%      88793 ±  4%  proc-vmstat.nr_mapped
      2286            +6.5%       2434        proc-vmstat.nr_page_table_pages
    139393          +599.4%     974869        proc-vmstat.nr_shmem
     29092            +6.6%      31019        proc-vmstat.nr_slab_reclaimable
    133919          +584.2%     916254        proc-vmstat.nr_zone_active_anon
    116775           +48.4%     173349 ±  3%  proc-vmstat.nr_zone_inactive_anon
     32135 ± 11%    +257.2%     114797 ± 21%  proc-vmstat.numa_hint_faults
     20858 ± 16%    +318.3%      87244 ±  6%  proc-vmstat.numa_hint_faults_local
 2.876e+09 ±  3%     -30.2%  2.008e+09        proc-vmstat.numa_hit
 2.876e+09 ±  3%     -30.2%  2.008e+09        proc-vmstat.numa_local
     25453 ±  7%     -75.2%       6324 ± 30%  proc-vmstat.numa_pages_migrated
    178224 ±  2%     +76.6%     314680 ±  7%  proc-vmstat.numa_pte_updates
    160889 ±  3%    +267.6%     591393 ±  6%  proc-vmstat.pgactivate
 2.295e+10 ±  3%     -30.2%  1.601e+10        proc-vmstat.pgalloc_normal
   1026605           +21.9%    1251671        proc-vmstat.pgfault
 2.295e+10 ±  3%     -30.2%  1.601e+10        proc-vmstat.pgfree
     25453 ±  7%     -75.2%       6324 ± 30%  proc-vmstat.pgmigrate_success
     39208 ±  2%      -6.1%      36815        proc-vmstat.pgreuse
   3164416           -20.3%    2521344 ±  2%  proc-vmstat.unevictable_pgs_scanned
  19248627           -22.1%   14989905        sched_debug.cfs_rq:/.avg_vruntime.avg
  20722680           -24.9%   15569530        sched_debug.cfs_rq:/.avg_vruntime.max
  17634233           -22.5%   13663168        sched_debug.cfs_rq:/.avg_vruntime.min
    949063 ±  2%     -70.5%     280388        sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.78 ± 10%    -100.0%       0.00        sched_debug.cfs_rq:/.h_nr_running.min
      0.16 ±  8%    +113.3%       0.33 ±  2%  sched_debug.cfs_rq:/.h_nr_running.stddev
      0.56 ±141%  +2.2e+07%     122016 ± 52%  sched_debug.cfs_rq:/.left_vruntime.avg
     45.01 ±141%  +2.2e+07%   10035976 ± 28%  sched_debug.cfs_rq:/.left_vruntime.max
      4.58 ±141%  +2.3e+07%    1072762 ± 36%  sched_debug.cfs_rq:/.left_vruntime.stddev
      5814 ± 10%    -100.0%       0.00        sched_debug.cfs_rq:/.load.min
      5.39 ±  9%     -73.2%       1.44 ± 10%  sched_debug.cfs_rq:/.load_avg.min
  19248627           -22.1%   14989905        sched_debug.cfs_rq:/.min_vruntime.avg
  20722680           -24.9%   15569530        sched_debug.cfs_rq:/.min_vruntime.max
  17634233           -22.5%   13663168        sched_debug.cfs_rq:/.min_vruntime.min
    949063 ±  2%     -70.5%     280388        sched_debug.cfs_rq:/.min_vruntime.stddev
      0.78 ± 10%    -100.0%       0.00        sched_debug.cfs_rq:/.nr_running.min
      0.06 ±  8%    +369.2%       0.30 ±  3%  sched_debug.cfs_rq:/.nr_running.stddev
      4.84 ± 26%   +1611.3%      82.79 ± 67%  sched_debug.cfs_rq:/.removed.load_avg.avg
     27.92 ± 12%   +3040.3%     876.79 ± 68%  sched_debug.cfs_rq:/.removed.load_avg.stddev
      0.56 ±141%  +2.2e+07%     122016 ± 52%  sched_debug.cfs_rq:/.right_vruntime.avg
     45.06 ±141%  +2.2e+07%   10035976 ± 28%  sched_debug.cfs_rq:/.right_vruntime.max
      4.59 ±141%  +2.3e+07%    1072762 ± 36%  sched_debug.cfs_rq:/.right_vruntime.stddev
    900.25           -10.4%     806.45        sched_debug.cfs_rq:/.runnable_avg.avg
    533.28 ±  4%     -87.0%      69.56 ± 39%  sched_debug.cfs_rq:/.runnable_avg.min
    122.77 ±  2%     +92.9%     236.86        sched_debug.cfs_rq:/.runnable_avg.stddev
    896.13           -10.8%     799.44        sched_debug.cfs_rq:/.util_avg.avg
    379.06 ±  4%     -83.4%      62.94 ± 37%  sched_debug.cfs_rq:/.util_avg.min
    116.35 ±  8%     +99.4%     232.04        sched_debug.cfs_rq:/.util_avg.stddev
    550.87           -14.2%     472.66 ±  2%  sched_debug.cfs_rq:/.util_est_enqueued.avg
      1124 ±  8%     +18.2%       1329 ±  3%  sched_debug.cfs_rq:/.util_est_enqueued.max
    134.17 ± 30%    -100.0%       0.00        sched_debug.cfs_rq:/.util_est_enqueued.min
    558243 ±  6%     -66.9%     184666        sched_debug.cpu.avg_idle.avg
     12860 ± 11%     -56.1%       5644        sched_debug.cpu.avg_idle.min
    365635           -53.5%     169863 ±  5%  sched_debug.cpu.avg_idle.stddev
      9.56 ±  3%     -28.4%       6.84 ±  8%  sched_debug.cpu.clock.stddev
      6999 ±  2%     -85.6%       1007 ±  3%  sched_debug.cpu.clock_task.stddev
      3985 ± 10%    -100.0%       0.00        sched_debug.cpu.curr->pid.min
    491.71 ± 10%    +209.3%       1520 ±  4%  sched_debug.cpu.curr->pid.stddev
    270.19 ±141%   +1096.6%       3233 ± 51%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.78 ± 10%    -100.0%       0.00        sched_debug.cpu.nr_running.min
      0.15 ±  6%    +121.7%       0.34 ±  2%  sched_debug.cpu.nr_running.stddev
     62041 ± 15%   +4280.9%    2717948        sched_debug.cpu.nr_switches.avg
   1074922 ± 14%    +292.6%    4220307 ±  2%  sched_debug.cpu.nr_switches.max
      1186 ±  2%  +1.2e+05%    1379073 ±  4%  sched_debug.cpu.nr_switches.min
    132392 ± 21%    +294.6%     522476 ±  5%  sched_debug.cpu.nr_switches.stddev
      6.44 ±  4%     +21.4%       7.82 ± 12%  sched_debug.cpu.nr_uninterruptible.stddev
      6.73 ± 13%     -84.8%       1.02 ±  5%  perf-stat.i.MPKI
 1.652e+10 ±  2%     -22.2%  1.285e+10        perf-stat.i.branch-instructions
      0.72            +0.0        0.75        perf-stat.i.branch-miss-rate%
  1.19e+08 ±  3%     -19.8%   95493630        perf-stat.i.branch-misses
     27.46 ± 12%     -26.2        1.30 ±  4%  perf-stat.i.cache-miss-rate%
 5.943e+08 ± 10%     -88.6%   67756219 ±  5%  perf-stat.i.cache-misses
 2.201e+09          +143.7%  5.364e+09        perf-stat.i.cache-references
     48911 ± 19%   +4695.4%    2345525        perf-stat.i.context-switches
      3.66 ±  2%     +28.5%       4.71        perf-stat.i.cpi
 3.228e+11            -4.1%  3.097e+11        perf-stat.i.cpu-cycles
    190.51         +1363.7%       2788 ± 10%  perf-stat.i.cpu-migrations
    803.99 ±  6%    +510.2%       4905 ±  5%  perf-stat.i.cycles-between-cache-misses
      0.00 ± 16%      +0.0        0.01 ± 14%  perf-stat.i.dTLB-load-miss-rate%
    755654 ± 18%    +232.4%    2512024 ± 14%  perf-stat.i.dTLB-load-misses
 2.385e+10 ±  2%     -26.9%  1.742e+10        perf-stat.i.dTLB-loads
      0.00 ± 31%      +0.0        0.01 ± 35%  perf-stat.i.dTLB-store-miss-rate%
    305657 ± 36%    +200.0%     916822 ± 35%  perf-stat.i.dTLB-store-misses
 1.288e+10 ±  2%     -28.8%  9.179e+09        perf-stat.i.dTLB-stores
 8.789e+10 ±  2%     -25.2%  6.578e+10        perf-stat.i.instructions
      0.28 ±  2%     -21.6%       0.22        perf-stat.i.ipc
      2.52            -4.1%       2.42        perf-stat.i.metric.GHz
    873.89 ± 12%     -67.0%     288.04 ±  8%  perf-stat.i.metric.K/sec
    435.61 ±  2%     -19.6%     350.06        perf-stat.i.metric.M/sec
      2799           +29.9%       3637 ±  2%  perf-stat.i.minor-faults
     99.74            -2.6       97.11        perf-stat.i.node-load-miss-rate%
 1.294e+08 ± 12%     -92.4%    9879207 ±  7%  perf-stat.i.node-load-misses
     76.55           +16.4       92.92        perf-stat.i.node-store-miss-rate%
 2.257e+08 ± 10%     -90.4%   21721672 ±  8%  perf-stat.i.node-store-misses
  69217511 ± 13%     -97.7%    1625810 ±  7%  perf-stat.i.node-stores
      2799           +29.9%       3637 ±  2%  perf-stat.i.page-faults
      6.79 ± 13%     -84.9%       1.03 ±  5%  perf-stat.overall.MPKI
      0.72            +0.0        0.74        perf-stat.overall.branch-miss-rate%
     27.06 ± 12%     -25.8        1.26 ±  4%  perf-stat.overall.cache-miss-rate%
      3.68 ±  2%     +28.1%       4.71        perf-stat.overall.cpi
    549.38 ± 10%    +736.0%       4592 ±  5%  perf-stat.overall.cycles-between-cache-misses
      0.00 ± 18%      +0.0        0.01 ± 14%  perf-stat.overall.dTLB-load-miss-rate%
      0.00 ± 36%      +0.0        0.01 ± 35%  perf-stat.overall.dTLB-store-miss-rate%
      0.27 ±  2%     -22.0%       0.21        perf-stat.overall.ipc
     99.80            -2.4       97.37        perf-stat.overall.node-load-miss-rate%
     76.60           +16.4       93.03        perf-stat.overall.node-store-miss-rate%
      9319            +5.8%       9855        perf-stat.overall.path-length
 1.646e+10 ±  2%     -22.2%  1.281e+10        perf-stat.ps.branch-instructions
 1.186e+08 ±  3%     -19.8%   95167897        perf-stat.ps.branch-misses
 5.924e+08 ± 10%     -88.6%   67384354 ±  5%  perf-stat.ps.cache-misses
 2.193e+09          +143.4%  5.339e+09        perf-stat.ps.cache-references
     49100 ± 19%   +4668.0%    2341074        perf-stat.ps.context-switches
 3.218e+11            -4.1%  3.087e+11        perf-stat.ps.cpu-cycles
    189.73         +1368.4%       2786 ± 10%  perf-stat.ps.cpu-migrations
    753056 ± 18%    +229.9%    2484575 ± 14%  perf-stat.ps.dTLB-load-misses
 2.377e+10 ±  2%     -26.9%  1.737e+10        perf-stat.ps.dTLB-loads
    304509 ± 36%    +199.1%     910856 ± 35%  perf-stat.ps.dTLB-store-misses
 1.284e+10 ±  2%     -28.7%  9.152e+09        perf-stat.ps.dTLB-stores
  8.76e+10 ±  2%     -25.2%  6.557e+10        perf-stat.ps.instructions
      2791           +28.2%       3580 ±  2%  perf-stat.ps.minor-faults
  1.29e+08 ± 12%     -92.4%    9815672 ±  7%  perf-stat.ps.node-load-misses
  2.25e+08 ± 10%     -90.4%   21575943 ±  8%  perf-stat.ps.node-store-misses
  69002373 ± 13%     -97.7%    1615410 ±  7%  perf-stat.ps.node-stores
      2791           +28.2%       3580 ±  2%  perf-stat.ps.page-faults
  2.68e+13 ±  2%     -26.1%  1.981e+13        perf-stat.total.instructions
      0.00 ± 35%   +2600.0%       0.04 ± 23%  perf-sched.sch_delay.avg.ms.__cond_resched.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg
      1.18 ±  9%     -98.1%       0.02 ± 32%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.58 ±  3%     -62.1%       0.22 ± 97%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.51 ± 22%     -82.7%       0.09 ± 11%  perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
      0.25 ± 23%     -59.6%       0.10 ± 10%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.03 ± 42%     -64.0%       0.01 ± 15%  perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.04 ±  7%    +434.6%       0.23 ± 36%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
      1.00 ± 20%     -84.1%       0.16 ± 78%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
      0.01 ±  7%     -70.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
      0.02 ±  2%    +533.9%       0.12 ± 43%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      0.03 ±  7%    +105.9%       0.06 ± 33%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.01 ± 15%     +67.5%       0.02 ±  8%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.09 ± 50%     -85.7%       0.01 ± 33%  perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
      0.04 ±  7%    +343.4%       0.16 ±  6%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.06 ± 41%   +3260.7%       1.88 ± 30%  perf-sched.sch_delay.max.ms.__cond_resched.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg
      3.78           -96.2%       0.14 ±  3%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      2.86 ±  4%     -72.6%       0.78 ±113%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      4.09 ±  7%     -34.1%       2.69 ±  7%  perf-sched.sch_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
      3.09 ± 37%     -64.1%       1.11 ±  5%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.00 ±141%   +6200.0%       0.13 ± 82%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
      3.94           -40.5%       2.35 ± 48%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      1.63 ± 21%     -77.0%       0.38 ± 90%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
      7.29 ± 39%    +417.5%      37.72 ± 16%  perf-sched.sch_delay.max.ms.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
      3.35 ± 14%     -51.7%       1.62 ±  3%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.05 ± 13%   +2245.1%       1.13 ± 40%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      3.01 ± 26%    +729.6%      25.01 ± 91%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1.93 ± 59%     -85.5%       0.28 ± 62%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
      0.01           -50.0%       0.00        perf-sched.total_sch_delay.average.ms
      7.29 ± 39%    +468.8%      41.46 ± 26%  perf-sched.total_sch_delay.max.ms
      6.04 ±  4%     -94.1%       0.35        perf-sched.total_wait_and_delay.average.ms
    205790 ±  3%   +1811.0%    3932742        perf-sched.total_wait_and_delay.count.ms
      6.03 ±  4%     -94.2%       0.35        perf-sched.total_wait_time.average.ms
     75.51 ± 41%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
     23.01 ± 17%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
     23.82 ±  7%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
     95.27 ± 41%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_inode_acct_block.shmem_alloc_and_acct_folio.shmem_get_folio_gfp.shmem_write_begin
     55.86 ±141%   +1014.6%     622.64 ±  5%  perf-sched.wait_and_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.07 ± 23%     -82.5%       0.01        perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
    137.41 ±  3%    +345.1%     611.63 ±  2%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
      0.04 ±  5%     -49.6%       0.02        perf-sched.wait_and_delay.avg.ms.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
    536.33 ±  5%     -46.5%     287.00        perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     21.67 ± 32%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      5.67 ±  8%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      1.67 ± 56%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
      5.67 ± 29%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.shmem_inode_acct_block.shmem_alloc_and_acct_folio.shmem_get_folio_gfp.shmem_write_begin
      5.33 ± 23%     +93.8%      10.33 ± 25%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    101725 ±  3%     +15.3%     117243 ± 10%  perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
    100.00 ±  7%     -80.3%      19.67 ±  2%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
     97762 ±  4%   +3794.8%    3807606        perf-sched.wait_and_delay.count.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
      1091 ±  9%    +111.9%       2311 ±  3%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    604.50 ± 43%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
     37.41 ±  9%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
     27.08 ± 13%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
    275.41 ± 32%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_inode_acct_block.shmem_alloc_and_acct_folio.shmem_get_folio_gfp.shmem_write_begin
      1313 ± 69%    +112.1%       2786 ± 15%  perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    333.38 ±141%    +200.4%       1001        perf-sched.wait_and_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      1000           -96.8%      31.85 ± 48%  perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
     17.99 ± 33%    +387.5%      87.71 ±  8%  perf-sched.wait_and_delay.max.ms.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
      0.33 ± 19%     -74.1%       0.09 ± 10%  perf-sched.wait_time.avg.ms.__cond_resched.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg
      0.02 ± 53%    +331.4%       0.10 ± 50%  perf-sched.wait_time.avg.ms.__cond_resched.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.__sys_recvfrom
      0.09 ± 65%     -75.9%       0.02 ±  9%  perf-sched.wait_time.avg.ms.__cond_resched.aa_sk_perm.security_socket_sendmsg.sock_sendmsg.__sys_sendto
      0.02 ± 22%     -70.2%       0.01 ±141%  perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
     75.51 ± 41%    -100.0%       0.04 ± 42%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.10 ± 36%     -80.3%       0.02 ±  9%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
      0.55 ± 61%     -94.9%       0.03 ± 45%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
     23.01 ± 17%    -100.0%       0.00 ±141%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
     23.82 ±  7%     -99.7%       0.07 ± 57%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
     95.27 ± 41%    -100.0%       0.03 ± 89%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_block.shmem_alloc_and_acct_folio.shmem_get_folio_gfp.shmem_write_begin
     56.30 ±139%   +1005.5%     622.44 ±  5%  perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      2.78 ± 66%     -98.2%       0.05 ± 52%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
      0.07 ± 23%     -82.5%       0.01        perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
    137.37 ±  3%    +345.1%     611.40 ±  2%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
      0.02 ±  5%     -41.9%       0.01 ±  3%  perf-sched.wait_time.avg.ms.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
    536.32 ±  5%     -46.5%     286.98        perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      4.66 ± 20%     -56.7%       2.02 ± 26%  perf-sched.wait_time.max.ms.__cond_resched.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg
      0.03 ± 63%    +995.0%       0.37 ± 26%  perf-sched.wait_time.max.ms.__cond_resched.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.__sys_recvfrom
      1.67 ± 87%     -92.6%       0.12 ± 57%  perf-sched.wait_time.max.ms.__cond_resched.aa_sk_perm.security_socket_sendmsg.sock_sendmsg.__sys_sendto
      0.54 ±117%     -95.1%       0.03 ±105%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.06 ± 49%     -89.1%       0.01 ±141%  perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
    604.50 ± 43%    -100.0%       0.16 ± 83%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      2.77 ± 45%     -95.4%       0.13 ± 64%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
      2.86 ± 45%     -94.3%       0.16 ± 91%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
     37.41 ±  9%    -100.0%       0.01 ±141%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
     27.08 ± 13%     -99.7%       0.08 ± 61%  perf-sched.wait_time.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
    275.41 ± 32%    -100.0%       0.03 ± 89%  perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_block.shmem_alloc_and_acct_folio.shmem_get_folio_gfp.shmem_write_begin
      1313 ± 69%    +112.1%       2786 ± 15%  perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    334.74 ±140%    +198.9%       1000        perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
     21.74 ± 58%     -95.4%       1.00 ±103%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
      1000           -97.6%      24.49 ± 50%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
     10.90 ± 27%    +682.9%      85.36 ±  6%  perf-sched.wait_time.max.ms.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
     32.91 ± 58%     -63.5%      12.01 ±115%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    169.97 ±  7%     -49.2%      86.29 ± 15%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     44.08           -19.8       24.25        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.free_pcppages_bulk.free_unref_page.skb_release_data.__consume_stateless_skb
     44.47           -19.6       24.87        perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page.skb_release_data.__consume_stateless_skb.udp_recvmsg
     43.63           -19.5       24.15        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.free_pcppages_bulk.free_unref_page.skb_release_data
     45.62           -19.2       26.39        perf-profile.calltrace.cycles-pp.skb_release_data.__consume_stateless_skb.udp_recvmsg.inet_recvmsg.sock_recvmsg
     45.62           -19.2       26.40        perf-profile.calltrace.cycles-pp.__consume_stateless_skb.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom
     45.00           -19.1       25.94        perf-profile.calltrace.cycles-pp.free_unref_page.skb_release_data.__consume_stateless_skb.udp_recvmsg.inet_recvmsg
     50.41           -16.8       33.64 ± 39%  perf-profile.calltrace.cycles-pp.accept_connections.main.__libc_start_main
     50.41           -16.8       33.64 ± 39%  perf-profile.calltrace.cycles-pp.accept_connection.accept_connections.main.__libc_start_main
     50.41           -16.8       33.64 ± 39%  perf-profile.calltrace.cycles-pp.spawn_child.accept_connection.accept_connections.main.__libc_start_main
     50.41           -16.8       33.64 ± 39%  perf-profile.calltrace.cycles-pp.process_requests.spawn_child.accept_connection.accept_connections.main
     99.92           -14.2       85.72 ± 15%  perf-profile.calltrace.cycles-pp.main.__libc_start_main
     99.96           -14.2       85.77 ± 15%  perf-profile.calltrace.cycles-pp.__libc_start_main
     50.10            -8.6       41.52        perf-profile.calltrace.cycles-pp.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
     50.11            -8.6       41.55        perf-profile.calltrace.cycles-pp.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64
     50.13            -8.5       41.64        perf-profile.calltrace.cycles-pp.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
     50.28            -8.0       42.27        perf-profile.calltrace.cycles-pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom
     50.29            -8.0       42.29        perf-profile.calltrace.cycles-pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni
     50.31            -7.9       42.42        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni.process_requests
     50.32            -7.8       42.47        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni.process_requests.spawn_child
     50.36            -7.6       42.78        perf-profile.calltrace.cycles-pp.recvfrom.recv_omni.process_requests.spawn_child.accept_connection
     50.41            -7.3       43.07        perf-profile.calltrace.cycles-pp.recv_omni.process_requests.spawn_child.accept_connection.accept_connections
     19.93 ±  2%      -6.6       13.36        perf-profile.calltrace.cycles-pp.ip_generic_getfrag.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg
     19.44 ±  2%      -6.3       13.16        perf-profile.calltrace.cycles-pp._copy_from_iter.ip_generic_getfrag.__ip_append_data.ip_make_skb.udp_sendmsg
     18.99 ±  2%      -6.1       12.90        perf-profile.calltrace.cycles-pp.copyin._copy_from_iter.ip_generic_getfrag.__ip_append_data.ip_make_skb
      8.95            -5.1        3.82        perf-profile.calltrace.cycles-pp.udp_send_skb.udp_sendmsg.sock_sendmsg.__sys_sendto.__x64_sys_sendto
      8.70            -5.0        3.71        perf-profile.calltrace.cycles-pp.ip_send_skb.udp_send_skb.udp_sendmsg.sock_sendmsg.__sys_sendto
      8.10            -4.6        3.45        perf-profile.calltrace.cycles-pp.ip_finish_output2.ip_send_skb.udp_send_skb.udp_sendmsg.sock_sendmsg
      7.69            -4.4        3.27        perf-profile.calltrace.cycles-pp.__dev_queue_xmit.ip_finish_output2.ip_send_skb.udp_send_skb.udp_sendmsg
      6.51            -3.7        2.78        perf-profile.calltrace.cycles-pp.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.ip_send_skb.udp_send_skb
      6.47            -3.7        2.75        perf-profile.calltrace.cycles-pp.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.ip_send_skb
      6.41            -3.7        2.71        perf-profile.calltrace.cycles-pp.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2
      5.88            -3.5        2.43        perf-profile.calltrace.cycles-pp.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit
      5.73            -3.4        2.35        perf-profile.calltrace.cycles-pp.__napi_poll.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip
      5.69            -3.4        2.33        perf-profile.calltrace.cycles-pp.process_backlog.__napi_poll.net_rx_action.__do_softirq.do_softirq
      5.36            -3.2        2.19        perf-profile.calltrace.cycles-pp.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action.__do_softirq
      4.59            -2.7        1.89        perf-profile.calltrace.cycles-pp.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action
      4.55 ±  2%      -2.7        1.88        perf-profile.calltrace.cycles-pp.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog.__napi_poll
      4.40 ±  2%      -2.6        1.81        perf-profile.calltrace.cycles-pp.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog
      3.81 ±  2%      -2.2        1.57        perf-profile.calltrace.cycles-pp.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core
      3.75 ±  2%      -2.2        1.55        perf-profile.calltrace.cycles-pp.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish
      2.21 ±  2%      -1.6        0.63        perf-profile.calltrace.cycles-pp.__ip_make_skb.ip_make_skb.udp_sendmsg.sock_sendmsg.__sys_sendto
      1.94 ±  2%      -1.4        0.51 ±  2%  perf-profile.calltrace.cycles-pp.__ip_select_ident.__ip_make_skb.ip_make_skb.udp_sendmsg.sock_sendmsg
      1.14            -0.6        0.51        perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg
      0.00            +0.5        0.53 ±  2%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      0.00            +0.7        0.69        perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state
      0.00            +0.7        0.71        perf-profile.calltrace.cycles-pp.__sk_mem_raise_allocated.__sk_mem_schedule.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb.udp_unicast_rcv_skb
      0.00            +0.7        0.72        perf-profile.calltrace.cycles-pp.__sk_mem_schedule.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv
      0.00            +1.0        0.99 ± 20%  perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp
      0.00            +1.0        1.01 ± 20%  perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
      0.00            +1.1        1.05 ± 20%  perf-profile.calltrace.cycles-pp.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg.inet_recvmsg
      0.00            +1.1        1.12        perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      0.00            +1.2        1.18 ± 20%  perf-profile.calltrace.cycles-pp.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg.inet_recvmsg.sock_recvmsg
      0.00            +1.3        1.32        perf-profile.calltrace.cycles-pp.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu
      0.00            +2.2        2.23        perf-profile.calltrace.cycles-pp.__skb_recv_udp.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom
     49.51            +2.6       52.08        perf-profile.calltrace.cycles-pp.send_udp_stream.main.__libc_start_main
     49.49            +2.6       52.07        perf-profile.calltrace.cycles-pp.send_omni_inner.send_udp_stream.main.__libc_start_main
      0.00            +3.0        2.96 ±  2%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
     48.71            +3.0       51.73        perf-profile.calltrace.cycles-pp.sendto.send_omni_inner.send_udp_stream.main.__libc_start_main
      0.00            +3.1        3.06 ±  2%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      0.00            +3.1        3.09        perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
     48.34            +3.2       51.56        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner.send_udp_stream.main
      0.00            +3.3        3.33 ±  2%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     48.13            +3.8       51.96        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner.send_udp_stream
     47.82            +4.0       51.82        perf-profile.calltrace.cycles-pp.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner
     47.70            +4.1       51.76        perf-profile.calltrace.cycles-pp.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto
      0.00            +4.1        4.08        perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      0.00            +4.1        4.10        perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      0.00            +4.1        4.10        perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
      0.00            +4.1        4.14        perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
      0.00            +4.3        4.35 ±  2%  perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
     46.52            +4.8       51.27        perf-profile.calltrace.cycles-pp.sock_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
     46.04            +5.0       51.08        perf-profile.calltrace.cycles-pp.udp_sendmsg.sock_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64
      3.67            +8.0       11.63        perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg
      3.71            +8.1       11.80        perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg
      3.96            +8.5       12.42        perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg.sock_recvmsg
      3.96            +8.5       12.44        perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom
     35.13           +11.3       46.39        perf-profile.calltrace.cycles-pp.ip_make_skb.udp_sendmsg.sock_sendmsg.__sys_sendto.__x64_sys_sendto
     32.68 ±  2%     +13.0       45.65        perf-profile.calltrace.cycles-pp.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg.__sys_sendto
     10.27           +20.3       30.59        perf-profile.calltrace.cycles-pp.sk_page_frag_refill.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg
     10.24           +20.3       30.58        perf-profile.calltrace.cycles-pp.skb_page_frag_refill.sk_page_frag_refill.__ip_append_data.ip_make_skb.udp_sendmsg
      9.84           +20.5       30.32        perf-profile.calltrace.cycles-pp.__alloc_pages.skb_page_frag_refill.sk_page_frag_refill.__ip_append_data.ip_make_skb
      9.59           +20.5       30.11        perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.skb_page_frag_refill.sk_page_frag_refill.__ip_append_data
      8.40           +21.0       29.42        perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.skb_page_frag_refill.sk_page_frag_refill
      6.13           +21.9       28.05        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.rmqueue_bulk.rmqueue.get_page_from_freelist
      6.20           +22.0       28.15        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages
      6.46           +22.5       28.91        perf-profile.calltrace.cycles-pp.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages.skb_page_frag_refill
     48.24           -21.8       26.43        perf-profile.children.cycles-pp.skb_release_data
     47.19           -21.2       25.98        perf-profile.children.cycles-pp.free_unref_page
     44.48           -19.6       24.88        perf-profile.children.cycles-pp.free_pcppages_bulk
     45.62           -19.2       26.40        perf-profile.children.cycles-pp.__consume_stateless_skb
     99.95           -14.2       85.76 ± 15%  perf-profile.children.cycles-pp.main
     99.96           -14.2       85.77 ± 15%  perf-profile.children.cycles-pp.__libc_start_main
     50.10            -8.6       41.53        perf-profile.children.cycles-pp.udp_recvmsg
     50.11            -8.6       41.56        perf-profile.children.cycles-pp.inet_recvmsg
     50.13            -8.5       41.65        perf-profile.children.cycles-pp.sock_recvmsg
     50.29            -8.0       42.28        perf-profile.children.cycles-pp.__sys_recvfrom
     50.29            -8.0       42.30        perf-profile.children.cycles-pp.__x64_sys_recvfrom
     50.38            -7.5       42.86        perf-profile.children.cycles-pp.recvfrom
     50.41            -7.3       43.07        perf-profile.children.cycles-pp.accept_connections
     50.41            -7.3       43.07        perf-profile.children.cycles-pp.accept_connection
     50.41            -7.3       43.07        perf-profile.children.cycles-pp.spawn_child
     50.41            -7.3       43.07        perf-profile.children.cycles-pp.process_requests
     50.41            -7.3       43.07        perf-profile.children.cycles-pp.recv_omni
     19.96 ±  2%      -6.5       13.50        perf-profile.children.cycles-pp.ip_generic_getfrag
     19.46 ±  2%      -6.2       13.28        perf-profile.children.cycles-pp._copy_from_iter
     19.21 ±  2%      -6.1       13.14        perf-profile.children.cycles-pp.copyin
      8.96            -5.1        3.86        perf-profile.children.cycles-pp.udp_send_skb
      8.72            -5.0        3.75        perf-profile.children.cycles-pp.ip_send_skb
      8.11            -4.6        3.49        perf-profile.children.cycles-pp.ip_finish_output2
      7.72            -4.4        3.32        perf-profile.children.cycles-pp.__dev_queue_xmit
     98.71            -4.1       94.59        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     98.51            -4.0       94.46        perf-profile.children.cycles-pp.do_syscall_64
      6.49            -3.7        2.78        perf-profile.children.cycles-pp.do_softirq
      6.51            -3.7        2.82        perf-profile.children.cycles-pp.__local_bh_enable_ip
      6.43            -3.7        2.78        perf-profile.children.cycles-pp.__do_softirq
      5.90            -3.4        2.46        perf-profile.children.cycles-pp.net_rx_action
      5.74            -3.4        2.38        perf-profile.children.cycles-pp.__napi_poll
      5.71            -3.4        2.36        perf-profile.children.cycles-pp.process_backlog
      5.37            -3.2        2.21        perf-profile.children.cycles-pp.__netif_receive_skb_one_core
      4.60            -2.7        1.91        perf-profile.children.cycles-pp.ip_local_deliver_finish
      4.57 ±  2%      -2.7        1.90        perf-profile.children.cycles-pp.ip_protocol_deliver_rcu
      4.42 ±  2%      -2.6        1.83        perf-profile.children.cycles-pp.__udp4_lib_rcv
      3.82 ±  2%      -2.2        1.58 ±  2%  perf-profile.children.cycles-pp.udp_unicast_rcv_skb
      3.78 ±  2%      -2.2        1.57 ±  2%  perf-profile.children.cycles-pp.udp_queue_rcv_one_skb
      2.23 ±  2%      -1.6        0.65 ±  2%  perf-profile.children.cycles-pp.__ip_make_skb
      1.95 ±  2%      -1.4        0.52 ±  3%  perf-profile.children.cycles-pp.__ip_select_ident
      1.51 ±  4%      -1.2        0.34        perf-profile.children.cycles-pp.free_unref_page_commit
      1.17            -0.7        0.51 ±  2%  perf-profile.children.cycles-pp.ip_route_output_flow
      1.15            -0.6        0.52        perf-profile.children.cycles-pp.sock_alloc_send_pskb
      0.91            -0.5        0.39        perf-profile.children.cycles-pp.alloc_skb_with_frags
      0.86            -0.5        0.37        perf-profile.children.cycles-pp.__alloc_skb
      0.83            -0.5        0.36 ±  2%  perf-profile.children.cycles-pp.ip_route_output_key_hash_rcu
      0.75            -0.4        0.32        perf-profile.children.cycles-pp.dev_hard_start_xmit
      0.72            -0.4        0.31 ±  3%  perf-profile.children.cycles-pp.fib_table_lookup
      0.67            -0.4        0.28        perf-profile.children.cycles-pp.loopback_xmit
      0.70 ±  2%      -0.4        0.33        perf-profile.children.cycles-pp.__zone_watermark_ok
      0.47 ±  4%      -0.3        0.15        perf-profile.children.cycles-pp.kmem_cache_free
      0.57            -0.3        0.26        perf-profile.children.cycles-pp.kmem_cache_alloc_node
      0.46            -0.3        0.18 ±  2%  perf-profile.children.cycles-pp.ip_rcv
      0.42            -0.3        0.17        perf-profile.children.cycles-pp.move_addr_to_kernel
      0.41            -0.2        0.16 ±  2%  perf-profile.children.cycles-pp.__udp4_lib_lookup
      0.32            -0.2        0.13        perf-profile.children.cycles-pp.__netif_rx
      0.30            -0.2        0.12        perf-profile.children.cycles-pp.netif_rx_internal
      0.30            -0.2        0.12        perf-profile.children.cycles-pp._copy_from_user
      0.31            -0.2        0.13        perf-profile.children.cycles-pp.kmalloc_reserve
      0.63            -0.2        0.46 ±  2%  perf-profile.children.cycles-pp.free_unref_page_prepare
      0.28            -0.2        0.11        perf-profile.children.cycles-pp.enqueue_to_backlog
      0.27            -0.2        0.11        perf-profile.children.cycles-pp.udp4_lib_lookup2
      0.29            -0.2        0.13 ±  6%  perf-profile.children.cycles-pp.send_data
      0.25            -0.2        0.10        perf-profile.children.cycles-pp.__netif_receive_skb_core
      0.23 ±  2%      -0.1        0.10 ±  4%  perf-profile.children.cycles-pp.security_socket_sendmsg
      0.19 ±  2%      -0.1        0.06        perf-profile.children.cycles-pp.ip_rcv_core
      0.37            -0.1        0.24        perf-profile.children.cycles-pp.irqtime_account_irq
      0.21            -0.1        0.08        perf-profile.children.cycles-pp.sock_wfree
      0.21 ±  3%      -0.1        0.08        perf-profile.children.cycles-pp.validate_xmit_skb
      0.20 ±  2%      -0.1        0.08        perf-profile.children.cycles-pp.ip_output
      0.22 ±  2%      -0.1        0.10 ±  4%  perf-profile.children.cycles-pp.ip_rcv_finish_core
      0.20 ±  6%      -0.1        0.09 ±  5%  perf-profile.children.cycles-pp.__mkroute_output
      0.21 ±  2%      -0.1        0.09 ±  5%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.28            -0.1        0.18        perf-profile.children.cycles-pp._raw_spin_trylock
      0.34 ±  3%      -0.1        0.25        perf-profile.children.cycles-pp.__slab_free
      0.13 ±  3%      -0.1        0.05        perf-profile.children.cycles-pp.siphash_3u32
      0.12 ±  4%      -0.1        0.03 ± 70%  perf-profile.children.cycles-pp.ipv4_pktinfo_prepare
      0.14 ±  3%      -0.1        0.06 ±  7%  perf-profile.children.cycles-pp.__ip_local_out
      0.20 ±  2%      -0.1        0.12        perf-profile.children.cycles-pp.aa_sk_perm
      0.18 ±  2%      -0.1        0.10        perf-profile.children.cycles-pp.get_pfnblock_flags_mask
      0.12 ±  3%      -0.1        0.05        perf-profile.children.cycles-pp.sk_filter_trim_cap
      0.13            -0.1        0.06        perf-profile.children.cycles-pp.ip_setup_cork
      0.13 ±  7%      -0.1        0.06 ±  8%  perf-profile.children.cycles-pp.fib_lookup_good_nhc
      0.15 ±  3%      -0.1        0.08 ±  5%  perf-profile.children.cycles-pp.skb_set_owner_w
      0.11 ±  4%      -0.1        0.05        perf-profile.children.cycles-pp.dst_release
      0.23 ±  2%      -0.1        0.17 ±  2%  perf-profile.children.cycles-pp.__entry_text_start
      0.11            -0.1        0.05        perf-profile.children.cycles-pp.ipv4_mtu
      0.20 ±  2%      -0.1        0.15 ±  3%  perf-profile.children.cycles-pp.__list_add_valid_or_report
      0.10            -0.1        0.05        perf-profile.children.cycles-pp.ip_send_check
      0.31 ±  2%      -0.0        0.26 ±  3%  perf-profile.children.cycles-pp.sockfd_lookup_light
      0.27            -0.0        0.22 ±  2%  perf-profile.children.cycles-pp.__fget_light
      0.63            -0.0        0.58        perf-profile.children.cycles-pp.__check_object_size
      0.15 ±  3%      -0.0        0.11        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.13            -0.0        0.09 ±  5%  perf-profile.children.cycles-pp.alloc_pages
      0.27            -0.0        0.24        perf-profile.children.cycles-pp.sched_clock_cpu
      0.11 ±  4%      -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.__cond_resched
      0.14 ±  3%      -0.0        0.11        perf-profile.children.cycles-pp.free_tail_page_prepare
      0.11            -0.0        0.08 ±  5%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.09 ±  9%      -0.0        0.06 ±  7%  perf-profile.children.cycles-pp.__xfrm_policy_check2
      0.23 ±  2%      -0.0        0.21 ±  2%  perf-profile.children.cycles-pp.sched_clock
      0.14 ±  3%      -0.0        0.11 ±  4%  perf-profile.children.cycles-pp.prep_compound_page
      0.21 ±  2%      -0.0        0.20 ±  2%  perf-profile.children.cycles-pp.native_sched_clock
      0.06            -0.0        0.05        perf-profile.children.cycles-pp.task_tick_fair
      0.06            -0.0        0.05        perf-profile.children.cycles-pp.check_stack_object
      0.18 ±  2%      +0.0        0.20 ±  2%  perf-profile.children.cycles-pp.perf_event_task_tick
      0.18 ±  2%      +0.0        0.19 ±  2%  perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
      0.31 ±  3%      +0.0        0.33        perf-profile.children.cycles-pp.tick_sched_handle
      0.31 ±  3%      +0.0        0.33        perf-profile.children.cycles-pp.update_process_times
      0.41 ±  2%      +0.0        0.43        perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.40 ±  2%      +0.0        0.42        perf-profile.children.cycles-pp.hrtimer_interrupt
      0.32 ±  2%      +0.0        0.34        perf-profile.children.cycles-pp.tick_sched_timer
      0.36 ±  2%      +0.0        0.39        perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.06 ±  7%      +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.05 ±  8%      +0.0        0.10        perf-profile.children.cycles-pp._raw_spin_lock_bh
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.update_cfs_group
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.cpuidle_governor_latency_req
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.flush_smp_call_function_queue
      0.00            +0.1        0.05 ±  8%  perf-profile.children.cycles-pp.prepare_to_wait_exclusive
      0.07            +0.1        0.13 ±  3%  perf-profile.children.cycles-pp.__mod_zone_page_state
      0.00            +0.1        0.06 ± 13%  perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.00            +0.1        0.06        perf-profile.children.cycles-pp.__x2apic_send_IPI_dest
      0.00            +0.1        0.06        perf-profile.children.cycles-pp.security_socket_recvmsg
      0.00            +0.1        0.06        perf-profile.children.cycles-pp.select_task_rq_fair
      0.00            +0.1        0.06        perf-profile.children.cycles-pp.tick_irq_enter
      0.00            +0.1        0.06        perf-profile.children.cycles-pp.tick_nohz_idle_enter
      0.42 ±  2%      +0.1        0.49 ±  2%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.00            +0.1        0.07 ±  7%  perf-profile.children.cycles-pp.ktime_get
      0.00            +0.1        0.07        perf-profile.children.cycles-pp.__get_user_4
      0.00            +0.1        0.07        perf-profile.children.cycles-pp.update_rq_clock
      0.00            +0.1        0.07        perf-profile.children.cycles-pp.select_task_rq
      0.00            +0.1        0.07        perf-profile.children.cycles-pp.native_apic_msr_eoi
      0.49            +0.1        0.57 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.11 ± 11%      +0.1        0.19 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock
      0.00            +0.1        0.08        perf-profile.children.cycles-pp.update_rq_clock_task
      0.00            +0.1        0.08        perf-profile.children.cycles-pp.__update_load_avg_se
      0.00            +0.1        0.09 ±  5%  perf-profile.children.cycles-pp.irq_enter_rcu
      0.00            +0.1        0.09 ±  5%  perf-profile.children.cycles-pp.__irq_exit_rcu
      0.00            +0.1        0.09        perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
      0.00            +0.1        0.09        perf-profile.children.cycles-pp.update_blocked_averages
      0.00            +0.1        0.09        perf-profile.children.cycles-pp.update_sg_lb_stats
      0.00            +0.1        0.09 ±  5%  perf-profile.children.cycles-pp.set_next_entity
      0.00            +0.1        0.10        perf-profile.children.cycles-pp.__switch_to_asm
      0.00            +0.1        0.11 ± 12%  perf-profile.children.cycles-pp._copy_to_user
      0.00            +0.1        0.12 ±  3%  perf-profile.children.cycles-pp.menu_select
      0.00            +0.1        0.12 ±  3%  perf-profile.children.cycles-pp.recv_data
      0.00            +0.1        0.12 ±  3%  perf-profile.children.cycles-pp.update_sd_lb_stats
      0.00            +0.1        0.13 ±  3%  perf-profile.children.cycles-pp.native_irq_return_iret
      0.00            +0.1        0.13 ±  3%  perf-profile.children.cycles-pp.__switch_to
      0.00            +0.1        0.13 ±  3%  perf-profile.children.cycles-pp.find_busiest_group
      0.00            +0.1        0.14        perf-profile.children.cycles-pp.finish_task_switch
      0.00            +0.1        0.15 ±  3%  perf-profile.children.cycles-pp.update_curr
      0.00            +0.2        0.15 ±  3%  perf-profile.children.cycles-pp.mem_cgroup_uncharge_skmem
      0.00            +0.2        0.16        perf-profile.children.cycles-pp.ttwu_queue_wakelist
      0.05            +0.2        0.22 ±  2%  perf-profile.children.cycles-pp.page_counter_try_charge
      0.00            +0.2        0.17 ±  2%  perf-profile.children.cycles-pp.load_balance
      0.00            +0.2        0.17 ±  2%  perf-profile.children.cycles-pp.___perf_sw_event
      0.02 ±141%      +0.2        0.19 ±  2%  perf-profile.children.cycles-pp.page_counter_uncharge
      0.33            +0.2        0.52        perf-profile.children.cycles-pp.__free_one_page
      0.02 ±141%      +0.2        0.21 ±  2%  perf-profile.children.cycles-pp.drain_stock
      0.00            +0.2        0.20 ±  2%  perf-profile.children.cycles-pp.prepare_task_switch
      0.16 ±  3%      +0.2        0.38 ±  2%  perf-profile.children.cycles-pp.simple_copy_to_iter
      0.07 ± 11%      +0.2        0.31        perf-profile.children.cycles-pp.refill_stock
      0.07 ±  6%      +0.2        0.31 ±  4%  perf-profile.children.cycles-pp.move_addr_to_user
      0.00            +0.2        0.24        perf-profile.children.cycles-pp.enqueue_entity
      0.00            +0.2        0.25        perf-profile.children.cycles-pp.update_load_avg
      0.21 ±  2%      +0.3        0.48        perf-profile.children.cycles-pp.__list_del_entry_valid_or_report
      0.00            +0.3        0.31 ±  4%  perf-profile.children.cycles-pp.dequeue_entity
      0.08 ±  5%      +0.3        0.40 ±  3%  perf-profile.children.cycles-pp.try_charge_memcg
      0.00            +0.3        0.33        perf-profile.children.cycles-pp.enqueue_task_fair
      0.00            +0.4        0.35 ±  2%  perf-profile.children.cycles-pp.dequeue_task_fair
      0.00            +0.4        0.35 ±  2%  perf-profile.children.cycles-pp.activate_task
      0.00            +0.4        0.36 ±  2%  perf-profile.children.cycles-pp.try_to_wake_up
      0.00            +0.4        0.37 ±  2%  perf-profile.children.cycles-pp.autoremove_wake_function
      0.00            +0.4        0.39 ±  3%  perf-profile.children.cycles-pp.newidle_balance
      0.12 ±  8%      +0.4        0.51 ±  2%  perf-profile.children.cycles-pp.mem_cgroup_charge_skmem
      0.00            +0.4        0.39        perf-profile.children.cycles-pp.ttwu_do_activate
      0.00            +0.4        0.40 ±  2%  perf-profile.children.cycles-pp.__wake_up_common
      0.18 ±  4%      +0.4        0.59        perf-profile.children.cycles-pp.udp_rmem_release
      0.11 ±  7%      +0.4        0.52        perf-profile.children.cycles-pp.__sk_mem_reduce_allocated
      0.00            +0.4        0.43        perf-profile.children.cycles-pp.__wake_up_common_lock
      0.00            +0.5        0.46        perf-profile.children.cycles-pp.sched_ttwu_pending
      0.00            +0.5        0.49        perf-profile.children.cycles-pp.sock_def_readable
      0.00            +0.5        0.53 ±  2%  perf-profile.children.cycles-pp.pick_next_task_fair
      0.00            +0.5        0.54 ±  2%  perf-profile.children.cycles-pp.schedule_idle
      0.00            +0.6        0.55        perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.15 ±  3%      +0.6        0.73 ±  2%  perf-profile.children.cycles-pp.__sk_mem_raise_allocated
      0.00            +0.6        0.57        perf-profile.children.cycles-pp.__sysvec_call_function_single
      0.16 ±  5%      +0.6        0.74 ±  2%  perf-profile.children.cycles-pp.__sk_mem_schedule
      0.00            +0.8        0.78        perf-profile.children.cycles-pp.sysvec_call_function_single
      0.41 ±  3%      +0.9        1.33 ±  2%  perf-profile.children.cycles-pp.__udp_enqueue_schedule_skb
      0.00            +1.2        1.16 ±  2%  perf-profile.children.cycles-pp.schedule
      0.00            +1.2        1.21 ±  2%  perf-profile.children.cycles-pp.schedule_timeout
      0.00            +1.3        1.33 ±  2%  perf-profile.children.cycles-pp.__skb_wait_for_more_packets
      0.00            +1.7        1.66 ±  2%  perf-profile.children.cycles-pp.__schedule
      0.27 ±  3%      +2.0        2.25        perf-profile.children.cycles-pp.__skb_recv_udp
     50.41            +2.4       52.81        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.00            +2.7        2.68        perf-profile.children.cycles-pp.asm_sysvec_call_function_single
     49.78            +2.7       52.49        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      0.00            +3.0        2.98        perf-profile.children.cycles-pp.acpi_safe_halt
      0.00            +3.0        3.00        perf-profile.children.cycles-pp.acpi_idle_enter
     49.51            +3.1       52.57        perf-profile.children.cycles-pp.send_udp_stream
     49.50            +3.1       52.56        perf-profile.children.cycles-pp.send_omni_inner
      0.00            +3.1        3.10        perf-profile.children.cycles-pp.cpuidle_enter_state
      0.00            +3.1        3.12        perf-profile.children.cycles-pp.cpuidle_enter
      0.00            +3.4        3.37        perf-profile.children.cycles-pp.cpuidle_idle_call
     48.90            +3.4       52.30        perf-profile.children.cycles-pp.sendto
     47.85            +4.0       51.83        perf-profile.children.cycles-pp.__x64_sys_sendto
     47.73            +4.0       51.77        perf-profile.children.cycles-pp.__sys_sendto
      0.00            +4.1        4.10        perf-profile.children.cycles-pp.start_secondary
      0.00            +4.1        4.13        perf-profile.children.cycles-pp.do_idle
      0.00            +4.1        4.14        perf-profile.children.cycles-pp.secondary_startup_64_no_verify
      0.00            +4.1        4.14        perf-profile.children.cycles-pp.cpu_startup_entry
     46.54            +4.7       51.28        perf-profile.children.cycles-pp.sock_sendmsg
     46.10            +5.0       51.11        perf-profile.children.cycles-pp.udp_sendmsg
      3.70            +8.0       11.71        perf-profile.children.cycles-pp.copyout
      3.71            +8.1       11.80        perf-profile.children.cycles-pp._copy_to_iter
      3.96            +8.5       12.43        perf-profile.children.cycles-pp.__skb_datagram_iter
      3.96            +8.5       12.44        perf-profile.children.cycles-pp.skb_copy_datagram_iter
     35.14           +11.3       46.40        perf-profile.children.cycles-pp.ip_make_skb
     32.71 ±  2%     +13.0       45.66        perf-profile.children.cycles-pp.__ip_append_data
     10.28           +20.6       30.89        perf-profile.children.cycles-pp.sk_page_frag_refill
     10.25           +20.6       30.88        perf-profile.children.cycles-pp.skb_page_frag_refill
      9.86           +20.8       30.63        perf-profile.children.cycles-pp.__alloc_pages
      9.62           +20.8       30.42        perf-profile.children.cycles-pp.get_page_from_freelist
      8.42           +21.3       29.72        perf-profile.children.cycles-pp.rmqueue
      6.47           +22.8       29.22        perf-profile.children.cycles-pp.rmqueue_bulk
     19.11 ±  2%      -6.0       13.08        perf-profile.self.cycles-pp.copyin
      1.81 ±  2%      -1.4        0.39        perf-profile.self.cycles-pp.rmqueue
      1.81 ±  2%      -1.3        0.46 ±  2%  perf-profile.self.cycles-pp.__ip_select_ident
      1.47 ±  4%      -1.2        0.31        perf-profile.self.cycles-pp.free_unref_page_commit
      1.29 ±  2%      -0.5        0.75        perf-profile.self.cycles-pp.__ip_append_data
      0.71            -0.4        0.29        perf-profile.self.cycles-pp.udp_sendmsg
      0.68 ±  2%      -0.4        0.32        perf-profile.self.cycles-pp.__zone_watermark_ok
      0.50            -0.3        0.16        perf-profile.self.cycles-pp.skb_release_data
      0.59 ±  3%      -0.3        0.26 ±  3%  perf-profile.self.cycles-pp.fib_table_lookup
      0.46 ±  4%      -0.3        0.15 ±  3%  perf-profile.self.cycles-pp.kmem_cache_free
      0.63            -0.3        0.33 ±  2%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.47            -0.3        0.19        perf-profile.self.cycles-pp.__sys_sendto
      0.44            -0.2        0.21 ±  2%  perf-profile.self.cycles-pp.kmem_cache_alloc_node
      0.36            -0.2        0.16 ±  3%  perf-profile.self.cycles-pp.send_omni_inner
      0.35 ±  2%      -0.2        0.15 ±  3%  perf-profile.self.cycles-pp.ip_finish_output2
      0.29            -0.2        0.12        perf-profile.self.cycles-pp._copy_from_user
      0.24            -0.1        0.10 ±  4%  perf-profile.self.cycles-pp.__netif_receive_skb_core
      0.22 ±  2%      -0.1        0.08 ±  5%  perf-profile.self.cycles-pp.free_unref_page
      0.19 ±  2%      -0.1        0.06        perf-profile.self.cycles-pp.ip_rcv_core
      0.21 ±  2%      -0.1        0.08        perf-profile.self.cycles-pp.__alloc_skb
      0.20 ±  2%      -0.1        0.08        perf-profile.self.cycles-pp.sock_wfree
      0.22 ±  2%      -0.1        0.10 ±  4%  perf-profile.self.cycles-pp.send_data
      0.21            -0.1        0.09        perf-profile.self.cycles-pp.sendto
      0.21 ±  2%      -0.1        0.10 ±  4%  perf-profile.self.cycles-pp.ip_rcv_finish_core
      0.21 ±  2%      -0.1        0.09 ±  5%  perf-profile.self.cycles-pp.__ip_make_skb
      0.20 ±  4%      -0.1        0.09 ±  5%  perf-profile.self.cycles-pp._raw_spin_lock_irq
      0.21 ±  2%      -0.1        0.10 ±  4%  perf-profile.self.cycles-pp.__dev_queue_xmit
      0.38 ±  3%      -0.1        0.27        perf-profile.self.cycles-pp.get_page_from_freelist
      0.20 ±  2%      -0.1        0.09        perf-profile.self.cycles-pp.udp_send_skb
      0.18 ±  2%      -0.1        0.07        perf-profile.self.cycles-pp.__udp_enqueue_schedule_skb
      0.18 ±  4%      -0.1        0.08 ±  6%  perf-profile.self.cycles-pp.__mkroute_output
      0.25            -0.1        0.15 ±  3%  perf-profile.self.cycles-pp._copy_from_iter
      0.27 ±  4%      -0.1        0.17 ±  2%  perf-profile.self.cycles-pp.skb_page_frag_refill
      0.16            -0.1        0.06 ±  7%  perf-profile.self.cycles-pp.sock_sendmsg
      0.33 ±  2%      -0.1        0.24        perf-profile.self.cycles-pp.__slab_free
      0.15 ±  3%      -0.1        0.06        perf-profile.self.cycles-pp.udp4_lib_lookup2
      0.38 ±  2%      -0.1        0.29 ±  2%  perf-profile.self.cycles-pp.free_unref_page_prepare
      0.26            -0.1        0.17        perf-profile.self.cycles-pp._raw_spin_trylock
      0.15            -0.1        0.06        perf-profile.self.cycles-pp.ip_output
      0.14            -0.1        0.05 ±  8%  perf-profile.self.cycles-pp.process_backlog
      0.14            -0.1        0.06        perf-profile.self.cycles-pp.ip_route_output_flow
      0.14            -0.1        0.06        perf-profile.self.cycles-pp.__udp4_lib_lookup
      0.21 ±  2%      -0.1        0.13 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.12 ±  3%      -0.1        0.05        perf-profile.self.cycles-pp.siphash_3u32
      0.13 ±  3%      -0.1        0.06 ±  8%  perf-profile.self.cycles-pp.ip_send_skb
      0.17            -0.1        0.10        perf-profile.self.cycles-pp.__do_softirq
      0.15 ±  3%      -0.1        0.08 ±  5%  perf-profile.self.cycles-pp.skb_set_owner_w
      0.17 ±  2%      -0.1        0.10 ±  4%  perf-profile.self.cycles-pp.aa_sk_perm
      0.12            -0.1        0.05        perf-profile.self.cycles-pp.__x64_sys_sendto
      0.12 ±  6%      -0.1        0.05        perf-profile.self.cycles-pp.fib_lookup_good_nhc
      0.19 ±  2%      -0.1        0.13        perf-profile.self.cycles-pp.__list_add_valid_or_report
      0.14 ±  3%      -0.1        0.07 ±  6%  perf-profile.self.cycles-pp.net_rx_action
      0.16 ±  2%      -0.1        0.10        perf-profile.self.cycles-pp.do_syscall_64
      0.11            -0.1        0.05        perf-profile.self.cycles-pp.__udp4_lib_rcv
      0.16 ±  3%      -0.1        0.10 ±  4%  perf-profile.self.cycles-pp.get_pfnblock_flags_mask
      0.11 ±  4%      -0.1        0.05        perf-profile.self.cycles-pp.ip_route_output_key_hash_rcu
      0.10 ±  4%      -0.1        0.05        perf-profile.self.cycles-pp.ip_generic_getfrag
      0.10            -0.1        0.05        perf-profile.self.cycles-pp.ipv4_mtu
      0.26            -0.0        0.21 ±  2%  perf-profile.self.cycles-pp.__fget_light
      0.15 ±  3%      -0.0        0.11 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.24            -0.0        0.20 ±  2%  perf-profile.self.cycles-pp.__alloc_pages
      0.15 ±  3%      -0.0        0.12        perf-profile.self.cycles-pp.__check_object_size
      0.11            -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.08 ±  5%      -0.0        0.05        perf-profile.self.cycles-pp.loopback_xmit
      0.13            -0.0        0.11 ±  4%  perf-profile.self.cycles-pp.prep_compound_page
      0.11            -0.0        0.09 ±  5%  perf-profile.self.cycles-pp.irqtime_account_irq
      0.09 ± 10%      -0.0        0.06 ±  7%  perf-profile.self.cycles-pp.__xfrm_policy_check2
      0.07            -0.0        0.05        perf-profile.self.cycles-pp.alloc_pages
      0.08            -0.0        0.06 ±  7%  perf-profile.self.cycles-pp.__entry_text_start
      0.09 ±  5%      -0.0        0.07        perf-profile.self.cycles-pp.free_tail_page_prepare
      0.10            +0.0        0.11        perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
      0.06            +0.0        0.08 ±  6%  perf-profile.self.cycles-pp.free_pcppages_bulk
      0.05 ±  8%      +0.0        0.10 ±  4%  perf-profile.self.cycles-pp._raw_spin_lock_bh
      0.07            +0.0        0.12        perf-profile.self.cycles-pp.__mod_zone_page_state
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.cpuidle_idle_call
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.udp_rmem_release
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.__flush_smp_call_function_queue
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.sock_def_readable
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.update_cfs_group
      0.11 ± 11%      +0.1        0.17 ±  2%  perf-profile.self.cycles-pp._raw_spin_lock
      0.00            +0.1        0.05 ±  8%  perf-profile.self.cycles-pp.finish_task_switch
      0.00            +0.1        0.05 ±  8%  perf-profile.self.cycles-pp.cgroup_rstat_updated
      0.00            +0.1        0.06        perf-profile.self.cycles-pp.do_idle
      0.00            +0.1        0.06        perf-profile.self.cycles-pp.__skb_wait_for_more_packets
      0.00            +0.1        0.06        perf-profile.self.cycles-pp.__x2apic_send_IPI_dest
      0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.enqueue_entity
      0.00            +0.1        0.07 ±  7%  perf-profile.self.cycles-pp.schedule_timeout
      0.00            +0.1        0.07 ±  7%  perf-profile.self.cycles-pp.move_addr_to_user
      0.00            +0.1        0.07 ±  7%  perf-profile.self.cycles-pp.menu_select
      0.00            +0.1        0.07 ±  7%  perf-profile.self.cycles-pp.native_apic_msr_eoi
      0.00            +0.1        0.07 ±  7%  perf-profile.self.cycles-pp.update_sg_lb_stats
      0.00            +0.1        0.07        perf-profile.self.cycles-pp.__update_load_avg_se
      0.00            +0.1        0.07        perf-profile.self.cycles-pp.__get_user_4
      0.00            +0.1        0.08 ±  6%  perf-profile.self.cycles-pp.__sk_mem_reduce_allocated
      0.00            +0.1        0.08        perf-profile.self.cycles-pp.update_curr
      0.00            +0.1        0.08 ±  5%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
      0.00            +0.1        0.09 ±  5%  perf-profile.self.cycles-pp.try_to_wake_up
      0.00            +0.1        0.09        perf-profile.self.cycles-pp.recvfrom
      0.00            +0.1        0.09        perf-profile.self.cycles-pp.mem_cgroup_charge_skmem
      0.00            +0.1        0.09        perf-profile.self.cycles-pp.update_load_avg
      0.00            +0.1        0.09 ±  5%  perf-profile.self.cycles-pp.enqueue_task_fair
      0.00            +0.1        0.10 ±  4%  perf-profile.self.cycles-pp._copy_to_iter
      0.00            +0.1        0.10 ±  4%  perf-profile.self.cycles-pp.newidle_balance
      0.00            +0.1        0.10 ±  4%  perf-profile.self.cycles-pp.recv_data
      0.00            +0.1        0.10        perf-profile.self.cycles-pp.refill_stock
      0.00            +0.1        0.10        perf-profile.self.cycles-pp.__switch_to_asm
      0.00            +0.1        0.11 ± 15%  perf-profile.self.cycles-pp._copy_to_user
      0.00            +0.1        0.12        perf-profile.self.cycles-pp.recv_omni
      0.00            +0.1        0.12        perf-profile.self.cycles-pp.mem_cgroup_uncharge_skmem
      0.00            +0.1        0.13 ±  3%  perf-profile.self.cycles-pp.native_irq_return_iret
      0.00            +0.1        0.13        perf-profile.self.cycles-pp.__switch_to
      0.06            +0.1        0.20 ±  2%  perf-profile.self.cycles-pp.rmqueue_bulk
      0.09 ±  5%      +0.1        0.23 ±  4%  perf-profile.self.cycles-pp.udp_recvmsg
      0.00            +0.1        0.14 ±  3%  perf-profile.self.cycles-pp.__skb_recv_udp
      0.00            +0.1        0.14 ±  3%  perf-profile.self.cycles-pp.___perf_sw_event
      0.08            +0.1        0.22 ±  2%  perf-profile.self.cycles-pp.__skb_datagram_iter
      0.03 ± 70%      +0.2        0.20 ±  4%  perf-profile.self.cycles-pp.page_counter_try_charge
      0.02 ±141%      +0.2        0.18 ±  4%  perf-profile.self.cycles-pp.__sys_recvfrom
      0.00            +0.2        0.17 ±  2%  perf-profile.self.cycles-pp.__schedule
      0.00            +0.2        0.17 ±  2%  perf-profile.self.cycles-pp.try_charge_memcg
      0.00            +0.2        0.17 ±  2%  perf-profile.self.cycles-pp.page_counter_uncharge
      0.00            +0.2        0.21 ±  2%  perf-profile.self.cycles-pp.__sk_mem_raise_allocated
      0.14 ±  3%      +0.2        0.36        perf-profile.self.cycles-pp.__free_one_page
      0.20 ±  2%      +0.3        0.47        perf-profile.self.cycles-pp.__list_del_entry_valid_or_report
      0.00            +2.1        2.07 ±  2%  perf-profile.self.cycles-pp.acpi_safe_halt
     49.78            +2.7       52.49        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      3.68            +8.0       11.64        perf-profile.self.cycles-pp.copyout



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Huang, Ying Nov. 6, 2023, 6:38 a.m. UTC | #3
Hi,

kernel test robot <oliver.sang@intel.com> writes:

> hi, Huang Ying,
>
> sorry for late of this report.
> we reported
> "a 14.6% improvement of netperf.Throughput_Mbps"
> in
> https://lore.kernel.org/all/202310271441.71ce0a9-oliver.sang@intel.com/
>
> later, our auto-bisect tool captured a regression on a netperf test with
> different configurations, however, unfortunately, regarded it as 'reported'
> so we missed this report at the first time.
>
> now send again FYI.
>
>
> Hello,
>
> kernel test robot noticed a -60.4% regression of netperf.Throughput_Mbps on:
>
>
> commit: f5ddc662f07d7d99e9cfc5e07778e26c7394caf8 ("[PATCH -V3 3/9] mm, pcp: reduce lock contention for draining high-order pages")
> url: https://github.com/intel-lab-lkp/linux/commits/Huang-Ying/mm-pcp-avoid-to-drain-PCP-when-process-exit/20231017-143633
> base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/driver-core.git 36b2d7dd5a8ac95c8c1e69bdc93c4a6e2dc28a23
> patch link: https://lore.kernel.org/all/20231016053002.756205-4-ying.huang@intel.com/
> patch subject: [PATCH -V3 3/9] mm, pcp: reduce lock contention for draining high-order pages
>
> testcase: netperf
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> parameters:
>
> 	ip: ipv4
> 	runtime: 300s
> 	nr_threads: 50%
> 	cluster: cs-localhost
> 	test: UDP_STREAM
> 	cpufreq_governor: performance
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202311061311.8d63998-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20231106/202311061311.8d63998-oliver.sang@intel.com
>
> =========================================================================================
> cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
>   cs-localhost/gcc-12/performance/ipv4/x86_64-rhel-8.3/50%/debian-11.1-x86_64-20220510.cgz/300s/lkp-icl-2sp2/UDP_STREAM/netperf
>
> commit: 
>   c828e65251 ("cacheinfo: calculate size of per-CPU data cache slice")
>   f5ddc662f0 ("mm, pcp: reduce lock contention for draining high-order pages")
>
> c828e65251502516 f5ddc662f07d7d99e9cfc5e0777 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>       7321   4%     +28.2%       9382        uptime.idle
>      50.65   4%      -4.0%      48.64        boot-time.boot
>       6042   4%      -4.2%       5785        boot-time.idle
>  1.089e+09   2%    +232.1%  3.618e+09        cpuidle..time
>    1087075   2%  +24095.8%   2.63e+08        cpuidle..usage
>    3357014           +99.9%    6710312        vmstat.memory.cache
>      48731  19%   +4666.5%    2322787        vmstat.system.cs
>     144637          +711.2%    1173334        vmstat.system.in
>       2.59   2%      +6.2        8.79        mpstat.cpu.all.idle%
>       1.01            +0.7        1.66        mpstat.cpu.all.irq%
>       6.00            -3.2        2.79        mpstat.cpu.all.soft%
>       1.13   2%      -0.1        1.02        mpstat.cpu.all.usr%
>  1.407e+09   3%     -28.2%  1.011e+09        numa-numastat.node0.local_node
>  1.407e+09   3%     -28.2%   1.01e+09        numa-numastat.node0.numa_hit
>  1.469e+09   8%     -32.0%  9.979e+08        numa-numastat.node1.local_node
>  1.469e+09   8%     -32.1%  9.974e+08        numa-numastat.node1.numa_hit
>     103.00  19%     -44.0%      57.67  20%  perf-c2c.DRAM.local
>       8970  12%     -89.4%     951.00   4%  perf-c2c.DRAM.remote
>       8192   5%     +68.5%      13807        perf-c2c.HITM.local
>       6675  11%     -92.6%     491.00   2%  perf-c2c.HITM.remote
>    1051014   2%  +24922.0%   2.63e+08        turbostat.C1
>       2.75   2%      +6.5        9.29        turbostat.C1%
>       2.72   2%    +178.3%       7.57        turbostat.CPU%c1
>       0.09           -22.2%       0.07        turbostat.IPC
>   44589125          +701.5%  3.574e+08        turbostat.IRQ
>     313.00  57%   +1967.0%       6469   8%  turbostat.POLL
>      70.33            +3.3%      72.67        turbostat.PkgTmp
>      44.23   4%     -31.8%      30.15   2%  turbostat.RAMWatt
>     536096          +583.7%    3665194        meminfo.Active
>     535414          +584.4%    3664543        meminfo.Active(anon)
>    3238301          +103.2%    6579677        meminfo.Cached
>    1204424          +278.9%    4563575        meminfo.Committed_AS
>     469093           +47.9%     693889   3%  meminfo.Inactive
>     467250           +48.4%     693496   3%  meminfo.Inactive(anon)
>      53615          +562.5%     355225   4%  meminfo.Mapped
>    5223078           +64.1%    8571212        meminfo.Memused
>     557305          +599.6%    3899111        meminfo.Shmem
>    5660207           +58.9%    8993642        meminfo.max_used_kB
>      78504   3%     -30.1%      54869        netperf.ThroughputBoth_Mbps
>    5024292   3%     -30.1%    3511666        netperf.ThroughputBoth_total_Mbps
>       7673   5%    +249.7%      26832        netperf.ThroughputRecv_Mbps
>     491074   5%    +249.7%    1717287        netperf.ThroughputRecv_total_Mbps
>      70831   2%     -60.4%      28037        netperf.Throughput_Mbps
>    4533217   2%     -60.4%    1794379        netperf.Throughput_total_Mbps

This is a UDP test.  So the sender will not wait for receiver.  In the
result, you can find that the sender throughput reduces 60.4%, while the
receiver throughput increases 249.7%.  And, much less packets are
dropped during the test, and this is good too.

All in all, considering the performance of both the sender and the
receiver, I think the patch helps the performance.

--
Best Regards,
Huang, Ying

>       5439            +9.4%       5949        netperf.time.percent_of_cpu_this_job_got
>      16206            +9.4%      17728        netperf.time.system_time
>     388.14           -51.9%     186.53        netperf.time.user_time
>  2.876e+09   3%     -30.1%   2.01e+09        netperf.workload
>     177360  30%     -36.0%     113450  20%  numa-meminfo.node0.AnonPages
>     255926  12%     -40.6%     152052  12%  numa-meminfo.node0.AnonPages.max
>      22582  61%    +484.2%     131916  90%  numa-meminfo.node0.Mapped
>     138287  17%     +22.6%     169534  12%  numa-meminfo.node1.AnonHugePages
>     267468  20%     +29.1%     345385   6%  numa-meminfo.node1.AnonPages
>     346204  18%     +34.5%     465696   2%  numa-meminfo.node1.AnonPages.max
>     279416  19%     +77.0%     494652  18%  numa-meminfo.node1.Inactive
>     278445  19%     +77.6%     494393  18%  numa-meminfo.node1.Inactive(anon)
>      31726  45%    +607.7%     224533  45%  numa-meminfo.node1.Mapped
>       4802   6%     +19.4%       5733   3%  numa-meminfo.node1.PageTables
>     297323  12%    +792.6%    2653850  63%  numa-meminfo.node1.Shmem
>      44325  30%     -36.0%      28379  20%  numa-vmstat.node0.nr_anon_pages
>       5590  61%    +491.0%      33042  90%  numa-vmstat.node0.nr_mapped
>  1.407e+09   3%     -28.2%   1.01e+09        numa-vmstat.node0.numa_hit
>  1.407e+09   3%     -28.2%  1.011e+09        numa-vmstat.node0.numa_local
>      66858  20%     +29.2%      86385   6%  numa-vmstat.node1.nr_anon_pages
>      69601  20%     +77.8%     123729  18%  numa-vmstat.node1.nr_inactive_anon
>       7953  45%    +608.3%      56335  45%  numa-vmstat.node1.nr_mapped
>       1201   6%     +19.4%       1434   3%  numa-vmstat.node1.nr_page_table_pages
>      74288  11%    +792.6%     663111  63%  numa-vmstat.node1.nr_shmem
>      69601  20%     +77.8%     123728  18%  numa-vmstat.node1.nr_zone_inactive_anon
>  1.469e+09   8%     -32.1%  9.974e+08        numa-vmstat.node1.numa_hit
>  1.469e+09   8%     -32.0%  9.979e+08        numa-vmstat.node1.numa_local
>     133919          +584.2%     916254        proc-vmstat.nr_active_anon
>     111196            +3.3%     114828        proc-vmstat.nr_anon_pages
>    5602484            -1.5%    5518799        proc-vmstat.nr_dirty_background_threshold
>   11218668            -1.5%   11051092        proc-vmstat.nr_dirty_threshold
>     809646          +103.2%    1645012        proc-vmstat.nr_file_pages
>   56374629            -1.5%   55536913        proc-vmstat.nr_free_pages
>     116775           +48.4%     173349   3%  proc-vmstat.nr_inactive_anon
>      13386   2%    +563.3%      88793   4%  proc-vmstat.nr_mapped
>       2286            +6.5%       2434        proc-vmstat.nr_page_table_pages
>     139393          +599.4%     974869        proc-vmstat.nr_shmem
>      29092            +6.6%      31019        proc-vmstat.nr_slab_reclaimable
>     133919          +584.2%     916254        proc-vmstat.nr_zone_active_anon
>     116775           +48.4%     173349   3%  proc-vmstat.nr_zone_inactive_anon
>      32135  11%    +257.2%     114797  21%  proc-vmstat.numa_hint_faults
>      20858  16%    +318.3%      87244   6%  proc-vmstat.numa_hint_faults_local
>  2.876e+09   3%     -30.2%  2.008e+09        proc-vmstat.numa_hit
>  2.876e+09   3%     -30.2%  2.008e+09        proc-vmstat.numa_local
>      25453   7%     -75.2%       6324  30%  proc-vmstat.numa_pages_migrated
>     178224   2%     +76.6%     314680   7%  proc-vmstat.numa_pte_updates
>     160889   3%    +267.6%     591393   6%  proc-vmstat.pgactivate
>  2.295e+10   3%     -30.2%  1.601e+10        proc-vmstat.pgalloc_normal
>    1026605           +21.9%    1251671        proc-vmstat.pgfault
>  2.295e+10   3%     -30.2%  1.601e+10        proc-vmstat.pgfree
>      25453   7%     -75.2%       6324  30%  proc-vmstat.pgmigrate_success
>      39208   2%      -6.1%      36815        proc-vmstat.pgreuse
>    3164416           -20.3%    2521344   2%  proc-vmstat.unevictable_pgs_scanned
>   19248627           -22.1%   14989905        sched_debug.cfs_rq:/.avg_vruntime.avg
>   20722680           -24.9%   15569530        sched_debug.cfs_rq:/.avg_vruntime.max
>   17634233           -22.5%   13663168        sched_debug.cfs_rq:/.avg_vruntime.min
>     949063   2%     -70.5%     280388        sched_debug.cfs_rq:/.avg_vruntime.stddev
>       0.78  10%    -100.0%       0.00        sched_debug.cfs_rq:/.h_nr_running.min
>       0.16   8%    +113.3%       0.33   2%  sched_debug.cfs_rq:/.h_nr_running.stddev
>       0.56 141%  +2.2e+07%     122016  52%  sched_debug.cfs_rq:/.left_vruntime.avg
>      45.01 141%  +2.2e+07%   10035976  28%  sched_debug.cfs_rq:/.left_vruntime.max
>       4.58 141%  +2.3e+07%    1072762  36%  sched_debug.cfs_rq:/.left_vruntime.stddev
>       5814  10%    -100.0%       0.00        sched_debug.cfs_rq:/.load.min
>       5.39   9%     -73.2%       1.44  10%  sched_debug.cfs_rq:/.load_avg.min
>   19248627           -22.1%   14989905        sched_debug.cfs_rq:/.min_vruntime.avg
>   20722680           -24.9%   15569530        sched_debug.cfs_rq:/.min_vruntime.max
>   17634233           -22.5%   13663168        sched_debug.cfs_rq:/.min_vruntime.min
>     949063   2%     -70.5%     280388        sched_debug.cfs_rq:/.min_vruntime.stddev
>       0.78  10%    -100.0%       0.00        sched_debug.cfs_rq:/.nr_running.min
>       0.06   8%    +369.2%       0.30   3%  sched_debug.cfs_rq:/.nr_running.stddev
>       4.84  26%   +1611.3%      82.79  67%  sched_debug.cfs_rq:/.removed.load_avg.avg
>      27.92  12%   +3040.3%     876.79  68%  sched_debug.cfs_rq:/.removed.load_avg.stddev
>       0.56 141%  +2.2e+07%     122016  52%  sched_debug.cfs_rq:/.right_vruntime.avg
>      45.06 141%  +2.2e+07%   10035976  28%  sched_debug.cfs_rq:/.right_vruntime.max
>       4.59 141%  +2.3e+07%    1072762  36%  sched_debug.cfs_rq:/.right_vruntime.stddev
>     900.25           -10.4%     806.45        sched_debug.cfs_rq:/.runnable_avg.avg
>     533.28   4%     -87.0%      69.56  39%  sched_debug.cfs_rq:/.runnable_avg.min
>     122.77   2%     +92.9%     236.86        sched_debug.cfs_rq:/.runnable_avg.stddev
>     896.13           -10.8%     799.44        sched_debug.cfs_rq:/.util_avg.avg
>     379.06   4%     -83.4%      62.94  37%  sched_debug.cfs_rq:/.util_avg.min
>     116.35   8%     +99.4%     232.04        sched_debug.cfs_rq:/.util_avg.stddev
>     550.87           -14.2%     472.66   2%  sched_debug.cfs_rq:/.util_est_enqueued.avg
>       1124   8%     +18.2%       1329   3%  sched_debug.cfs_rq:/.util_est_enqueued.max
>     134.17  30%    -100.0%       0.00        sched_debug.cfs_rq:/.util_est_enqueued.min
>     558243   6%     -66.9%     184666        sched_debug.cpu.avg_idle.avg
>      12860  11%     -56.1%       5644        sched_debug.cpu.avg_idle.min
>     365635           -53.5%     169863   5%  sched_debug.cpu.avg_idle.stddev
>       9.56   3%     -28.4%       6.84   8%  sched_debug.cpu.clock.stddev
>       6999   2%     -85.6%       1007   3%  sched_debug.cpu.clock_task.stddev
>       3985  10%    -100.0%       0.00        sched_debug.cpu.curr->pid.min
>     491.71  10%    +209.3%       1520   4%  sched_debug.cpu.curr->pid.stddev
>     270.19 141%   +1096.6%       3233  51%  sched_debug.cpu.max_idle_balance_cost.stddev
>       0.78  10%    -100.0%       0.00        sched_debug.cpu.nr_running.min
>       0.15   6%    +121.7%       0.34   2%  sched_debug.cpu.nr_running.stddev
>      62041  15%   +4280.9%    2717948        sched_debug.cpu.nr_switches.avg
>    1074922  14%    +292.6%    4220307   2%  sched_debug.cpu.nr_switches.max
>       1186   2%  +1.2e+05%    1379073   4%  sched_debug.cpu.nr_switches.min
>     132392  21%    +294.6%     522476   5%  sched_debug.cpu.nr_switches.stddev
>       6.44   4%     +21.4%       7.82  12%  sched_debug.cpu.nr_uninterruptible.stddev
>       6.73  13%     -84.8%       1.02   5%  perf-stat.i.MPKI
>  1.652e+10   2%     -22.2%  1.285e+10        perf-stat.i.branch-instructions
>       0.72            +0.0        0.75        perf-stat.i.branch-miss-rate%
>   1.19e+08   3%     -19.8%   95493630        perf-stat.i.branch-misses
>      27.46  12%     -26.2        1.30   4%  perf-stat.i.cache-miss-rate%
>  5.943e+08  10%     -88.6%   67756219   5%  perf-stat.i.cache-misses
>  2.201e+09          +143.7%  5.364e+09        perf-stat.i.cache-references
>      48911  19%   +4695.4%    2345525        perf-stat.i.context-switches
>       3.66   2%     +28.5%       4.71        perf-stat.i.cpi
>  3.228e+11            -4.1%  3.097e+11        perf-stat.i.cpu-cycles
>     190.51         +1363.7%       2788  10%  perf-stat.i.cpu-migrations
>     803.99   6%    +510.2%       4905   5%  perf-stat.i.cycles-between-cache-misses
>       0.00  16%      +0.0        0.01  14%  perf-stat.i.dTLB-load-miss-rate%
>     755654  18%    +232.4%    2512024  14%  perf-stat.i.dTLB-load-misses
>  2.385e+10   2%     -26.9%  1.742e+10        perf-stat.i.dTLB-loads
>       0.00  31%      +0.0        0.01  35%  perf-stat.i.dTLB-store-miss-rate%
>     305657  36%    +200.0%     916822  35%  perf-stat.i.dTLB-store-misses
>  1.288e+10   2%     -28.8%  9.179e+09        perf-stat.i.dTLB-stores
>  8.789e+10   2%     -25.2%  6.578e+10        perf-stat.i.instructions
>       0.28   2%     -21.6%       0.22        perf-stat.i.ipc
>       2.52            -4.1%       2.42        perf-stat.i.metric.GHz
>     873.89  12%     -67.0%     288.04   8%  perf-stat.i.metric.K/sec
>     435.61   2%     -19.6%     350.06        perf-stat.i.metric.M/sec
>       2799           +29.9%       3637   2%  perf-stat.i.minor-faults
>      99.74            -2.6       97.11        perf-stat.i.node-load-miss-rate%
>  1.294e+08  12%     -92.4%    9879207   7%  perf-stat.i.node-load-misses
>      76.55           +16.4       92.92        perf-stat.i.node-store-miss-rate%
>  2.257e+08  10%     -90.4%   21721672   8%  perf-stat.i.node-store-misses
>   69217511  13%     -97.7%    1625810   7%  perf-stat.i.node-stores
>       2799           +29.9%       3637   2%  perf-stat.i.page-faults
>       6.79  13%     -84.9%       1.03   5%  perf-stat.overall.MPKI
>       0.72            +0.0        0.74        perf-stat.overall.branch-miss-rate%
>      27.06  12%     -25.8        1.26   4%  perf-stat.overall.cache-miss-rate%
>       3.68   2%     +28.1%       4.71        perf-stat.overall.cpi
>     549.38  10%    +736.0%       4592   5%  perf-stat.overall.cycles-between-cache-misses
>       0.00  18%      +0.0        0.01  14%  perf-stat.overall.dTLB-load-miss-rate%
>       0.00  36%      +0.0        0.01  35%  perf-stat.overall.dTLB-store-miss-rate%
>       0.27   2%     -22.0%       0.21        perf-stat.overall.ipc
>      99.80            -2.4       97.37        perf-stat.overall.node-load-miss-rate%
>      76.60           +16.4       93.03        perf-stat.overall.node-store-miss-rate%
>       9319            +5.8%       9855        perf-stat.overall.path-length
>  1.646e+10   2%     -22.2%  1.281e+10        perf-stat.ps.branch-instructions
>  1.186e+08   3%     -19.8%   95167897        perf-stat.ps.branch-misses
>  5.924e+08  10%     -88.6%   67384354   5%  perf-stat.ps.cache-misses
>  2.193e+09          +143.4%  5.339e+09        perf-stat.ps.cache-references
>      49100  19%   +4668.0%    2341074        perf-stat.ps.context-switches
>  3.218e+11            -4.1%  3.087e+11        perf-stat.ps.cpu-cycles
>     189.73         +1368.4%       2786  10%  perf-stat.ps.cpu-migrations
>     753056  18%    +229.9%    2484575  14%  perf-stat.ps.dTLB-load-misses
>  2.377e+10   2%     -26.9%  1.737e+10        perf-stat.ps.dTLB-loads
>     304509  36%    +199.1%     910856  35%  perf-stat.ps.dTLB-store-misses
>  1.284e+10   2%     -28.7%  9.152e+09        perf-stat.ps.dTLB-stores
>   8.76e+10   2%     -25.2%  6.557e+10        perf-stat.ps.instructions
>       2791           +28.2%       3580   2%  perf-stat.ps.minor-faults
>   1.29e+08  12%     -92.4%    9815672   7%  perf-stat.ps.node-load-misses
>   2.25e+08  10%     -90.4%   21575943   8%  perf-stat.ps.node-store-misses
>   69002373  13%     -97.7%    1615410   7%  perf-stat.ps.node-stores
>       2791           +28.2%       3580   2%  perf-stat.ps.page-faults
>   2.68e+13   2%     -26.1%  1.981e+13        perf-stat.total.instructions
>       0.00  35%   +2600.0%       0.04  23%  perf-sched.sch_delay.avg.ms.__cond_resched.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg
>       1.18   9%     -98.1%       0.02  32%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
>       0.58   3%     -62.1%       0.22  97%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       0.51  22%     -82.7%       0.09  11%  perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
>       0.25  23%     -59.6%       0.10  10%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
>       0.03  42%     -64.0%       0.01  15%  perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>       0.04   7%    +434.6%       0.23  36%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
>       1.00  20%     -84.1%       0.16  78%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
>       0.01   7%     -70.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
>       0.02   2%    +533.9%       0.12  43%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>       0.03   7%    +105.9%       0.06  33%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>       0.01  15%     +67.5%       0.02   8%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       0.09  50%     -85.7%       0.01  33%  perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
>       0.04   7%    +343.4%       0.16   6%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       0.06  41%   +3260.7%       1.88  30%  perf-sched.sch_delay.max.ms.__cond_resched.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg
>       3.78           -96.2%       0.14   3%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
>       2.86   4%     -72.6%       0.78 113%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       4.09   7%     -34.1%       2.69   7%  perf-sched.sch_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
>       3.09  37%     -64.1%       1.11   5%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
>       0.00 141%   +6200.0%       0.13  82%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
>       3.94           -40.5%       2.35  48%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>       1.63  21%     -77.0%       0.38  90%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
>       7.29  39%    +417.5%      37.72  16%  perf-sched.sch_delay.max.ms.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
>       3.35  14%     -51.7%       1.62   3%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>       0.05  13%   +2245.1%       1.13  40%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>       3.01  26%    +729.6%      25.01  91%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>       1.93  59%     -85.5%       0.28  62%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
>       0.01           -50.0%       0.00        perf-sched.total_sch_delay.average.ms
>       7.29  39%    +468.8%      41.46  26%  perf-sched.total_sch_delay.max.ms
>       6.04   4%     -94.1%       0.35        perf-sched.total_wait_and_delay.average.ms
>     205790   3%   +1811.0%    3932742        perf-sched.total_wait_and_delay.count.ms
>       6.03   4%     -94.2%       0.35        perf-sched.total_wait_time.average.ms
>      75.51  41%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>      23.01  17%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>      23.82   7%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
>      95.27  41%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_inode_acct_block.shmem_alloc_and_acct_folio.shmem_get_folio_gfp.shmem_write_begin
>      55.86 141%   +1014.6%     622.64   5%  perf-sched.wait_and_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>       0.07  23%     -82.5%       0.01        perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>     137.41   3%    +345.1%     611.63   2%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
>       0.04   5%     -49.6%       0.02        perf-sched.wait_and_delay.avg.ms.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
>     536.33   5%     -46.5%     287.00        perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      21.67  32%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>       5.67   8%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>       1.67  56%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
>       5.67  29%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.shmem_inode_acct_block.shmem_alloc_and_acct_folio.shmem_get_folio_gfp.shmem_write_begin
>       5.33  23%     +93.8%      10.33  25%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>     101725   3%     +15.3%     117243  10%  perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>     100.00   7%     -80.3%      19.67   2%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
>      97762   4%   +3794.8%    3807606        perf-sched.wait_and_delay.count.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
>       1091   9%    +111.9%       2311   3%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>     604.50  43%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>      37.41   9%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>      27.08  13%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
>     275.41  32%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_inode_acct_block.shmem_alloc_and_acct_folio.shmem_get_folio_gfp.shmem_write_begin
>       1313  69%    +112.1%       2786  15%  perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>     333.38 141%    +200.4%       1001        perf-sched.wait_and_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>       1000           -96.8%      31.85  48%  perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>      17.99  33%    +387.5%      87.71   8%  perf-sched.wait_and_delay.max.ms.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
>       0.33  19%     -74.1%       0.09  10%  perf-sched.wait_time.avg.ms.__cond_resched.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg
>       0.02  53%    +331.4%       0.10  50%  perf-sched.wait_time.avg.ms.__cond_resched.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.__sys_recvfrom
>       0.09  65%     -75.9%       0.02   9%  perf-sched.wait_time.avg.ms.__cond_resched.aa_sk_perm.security_socket_sendmsg.sock_sendmsg.__sys_sendto
>       0.02  22%     -70.2%       0.01 141%  perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>      75.51  41%    -100.0%       0.04  42%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>       0.10  36%     -80.3%       0.02   9%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>       0.55  61%     -94.9%       0.03  45%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
>      23.01  17%    -100.0%       0.00 141%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>      23.82   7%     -99.7%       0.07  57%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
>      95.27  41%    -100.0%       0.03  89%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_block.shmem_alloc_and_acct_folio.shmem_get_folio_gfp.shmem_write_begin
>      56.30 139%   +1005.5%     622.44   5%  perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>       2.78  66%     -98.2%       0.05  52%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
>       0.07  23%     -82.5%       0.01        perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>     137.37   3%    +345.1%     611.40   2%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
>       0.02   5%     -41.9%       0.01   3%  perf-sched.wait_time.avg.ms.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
>     536.32   5%     -46.5%     286.98        perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       4.66  20%     -56.7%       2.02  26%  perf-sched.wait_time.max.ms.__cond_resched.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg
>       0.03  63%    +995.0%       0.37  26%  perf-sched.wait_time.max.ms.__cond_resched.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.__sys_recvfrom
>       1.67  87%     -92.6%       0.12  57%  perf-sched.wait_time.max.ms.__cond_resched.aa_sk_perm.security_socket_sendmsg.sock_sendmsg.__sys_sendto
>       0.54 117%     -95.1%       0.03 105%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>       0.06  49%     -89.1%       0.01 141%  perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>     604.50  43%    -100.0%       0.16  83%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>       2.77  45%     -95.4%       0.13  64%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>       2.86  45%     -94.3%       0.16  91%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
>      37.41   9%    -100.0%       0.01 141%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>      27.08  13%     -99.7%       0.08  61%  perf-sched.wait_time.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
>     275.41  32%    -100.0%       0.03  89%  perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_block.shmem_alloc_and_acct_folio.shmem_get_folio_gfp.shmem_write_begin
>       1313  69%    +112.1%       2786  15%  perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>     334.74 140%    +198.9%       1000        perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>      21.74  58%     -95.4%       1.00 103%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
>       1000           -97.6%      24.49  50%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>      10.90  27%    +682.9%      85.36   6%  perf-sched.wait_time.max.ms.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
>      32.91  58%     -63.5%      12.01 115%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>     169.97   7%     -49.2%      86.29  15%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>      44.08           -19.8       24.25        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.free_pcppages_bulk.free_unref_page.skb_release_data.__consume_stateless_skb
>      44.47           -19.6       24.87        perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page.skb_release_data.__consume_stateless_skb.udp_recvmsg
>      43.63           -19.5       24.15        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.free_pcppages_bulk.free_unref_page.skb_release_data
>      45.62           -19.2       26.39        perf-profile.calltrace.cycles-pp.skb_release_data.__consume_stateless_skb.udp_recvmsg.inet_recvmsg.sock_recvmsg
>      45.62           -19.2       26.40        perf-profile.calltrace.cycles-pp.__consume_stateless_skb.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom
>      45.00           -19.1       25.94        perf-profile.calltrace.cycles-pp.free_unref_page.skb_release_data.__consume_stateless_skb.udp_recvmsg.inet_recvmsg
>      50.41           -16.8       33.64  39%  perf-profile.calltrace.cycles-pp.accept_connections.main.__libc_start_main
>      50.41           -16.8       33.64  39%  perf-profile.calltrace.cycles-pp.accept_connection.accept_connections.main.__libc_start_main
>      50.41           -16.8       33.64  39%  perf-profile.calltrace.cycles-pp.spawn_child.accept_connection.accept_connections.main.__libc_start_main
>      50.41           -16.8       33.64  39%  perf-profile.calltrace.cycles-pp.process_requests.spawn_child.accept_connection.accept_connections.main
>      99.92           -14.2       85.72  15%  perf-profile.calltrace.cycles-pp.main.__libc_start_main
>      99.96           -14.2       85.77  15%  perf-profile.calltrace.cycles-pp.__libc_start_main
>      50.10            -8.6       41.52        perf-profile.calltrace.cycles-pp.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
>      50.11            -8.6       41.55        perf-profile.calltrace.cycles-pp.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64
>      50.13            -8.5       41.64        perf-profile.calltrace.cycles-pp.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      50.28            -8.0       42.27        perf-profile.calltrace.cycles-pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom
>      50.29            -8.0       42.29        perf-profile.calltrace.cycles-pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni
>      50.31            -7.9       42.42        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni.process_requests
>      50.32            -7.8       42.47        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni.process_requests.spawn_child
>      50.36            -7.6       42.78        perf-profile.calltrace.cycles-pp.recvfrom.recv_omni.process_requests.spawn_child.accept_connection
>      50.41            -7.3       43.07        perf-profile.calltrace.cycles-pp.recv_omni.process_requests.spawn_child.accept_connection.accept_connections
>      19.93   2%      -6.6       13.36        perf-profile.calltrace.cycles-pp.ip_generic_getfrag.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg
>      19.44   2%      -6.3       13.16        perf-profile.calltrace.cycles-pp._copy_from_iter.ip_generic_getfrag.__ip_append_data.ip_make_skb.udp_sendmsg
>      18.99   2%      -6.1       12.90        perf-profile.calltrace.cycles-pp.copyin._copy_from_iter.ip_generic_getfrag.__ip_append_data.ip_make_skb
>       8.95            -5.1        3.82        perf-profile.calltrace.cycles-pp.udp_send_skb.udp_sendmsg.sock_sendmsg.__sys_sendto.__x64_sys_sendto
>       8.70            -5.0        3.71        perf-profile.calltrace.cycles-pp.ip_send_skb.udp_send_skb.udp_sendmsg.sock_sendmsg.__sys_sendto
>       8.10            -4.6        3.45        perf-profile.calltrace.cycles-pp.ip_finish_output2.ip_send_skb.udp_send_skb.udp_sendmsg.sock_sendmsg
>       7.69            -4.4        3.27        perf-profile.calltrace.cycles-pp.__dev_queue_xmit.ip_finish_output2.ip_send_skb.udp_send_skb.udp_sendmsg
>       6.51            -3.7        2.78        perf-profile.calltrace.cycles-pp.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.ip_send_skb.udp_send_skb
>       6.47            -3.7        2.75        perf-profile.calltrace.cycles-pp.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.ip_send_skb
>       6.41            -3.7        2.71        perf-profile.calltrace.cycles-pp.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2
>       5.88            -3.5        2.43        perf-profile.calltrace.cycles-pp.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit
>       5.73            -3.4        2.35        perf-profile.calltrace.cycles-pp.__napi_poll.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip
>       5.69            -3.4        2.33        perf-profile.calltrace.cycles-pp.process_backlog.__napi_poll.net_rx_action.__do_softirq.do_softirq
>       5.36            -3.2        2.19        perf-profile.calltrace.cycles-pp.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action.__do_softirq
>       4.59            -2.7        1.89        perf-profile.calltrace.cycles-pp.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action
>       4.55   2%      -2.7        1.88        perf-profile.calltrace.cycles-pp.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog.__napi_poll
>       4.40   2%      -2.6        1.81        perf-profile.calltrace.cycles-pp.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog
>       3.81   2%      -2.2        1.57        perf-profile.calltrace.cycles-pp.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core
>       3.75   2%      -2.2        1.55        perf-profile.calltrace.cycles-pp.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish
>       2.21   2%      -1.6        0.63        perf-profile.calltrace.cycles-pp.__ip_make_skb.ip_make_skb.udp_sendmsg.sock_sendmsg.__sys_sendto
>       1.94   2%      -1.4        0.51   2%  perf-profile.calltrace.cycles-pp.__ip_select_ident.__ip_make_skb.ip_make_skb.udp_sendmsg.sock_sendmsg
>       1.14            -0.6        0.51        perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg
>       0.00            +0.5        0.53   2%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>       0.00            +0.7        0.69        perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state
>       0.00            +0.7        0.71        perf-profile.calltrace.cycles-pp.__sk_mem_raise_allocated.__sk_mem_schedule.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb.udp_unicast_rcv_skb
>       0.00            +0.7        0.72        perf-profile.calltrace.cycles-pp.__sk_mem_schedule.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv
>       0.00            +1.0        0.99  20%  perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp
>       0.00            +1.0        1.01  20%  perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
>       0.00            +1.1        1.05  20%  perf-profile.calltrace.cycles-pp.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg.inet_recvmsg
>       0.00            +1.1        1.12        perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
>       0.00            +1.2        1.18  20%  perf-profile.calltrace.cycles-pp.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg.inet_recvmsg.sock_recvmsg
>       0.00            +1.3        1.32        perf-profile.calltrace.cycles-pp.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu
>       0.00            +2.2        2.23        perf-profile.calltrace.cycles-pp.__skb_recv_udp.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom
>      49.51            +2.6       52.08        perf-profile.calltrace.cycles-pp.send_udp_stream.main.__libc_start_main
>      49.49            +2.6       52.07        perf-profile.calltrace.cycles-pp.send_omni_inner.send_udp_stream.main.__libc_start_main
>       0.00            +3.0        2.96   2%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>      48.71            +3.0       51.73        perf-profile.calltrace.cycles-pp.sendto.send_omni_inner.send_udp_stream.main.__libc_start_main
>       0.00            +3.1        3.06   2%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
>       0.00            +3.1        3.09        perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
>      48.34            +3.2       51.56        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner.send_udp_stream.main
>       0.00            +3.3        3.33   2%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>      48.13            +3.8       51.96        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner.send_udp_stream
>      47.82            +4.0       51.82        perf-profile.calltrace.cycles-pp.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner
>      47.70            +4.1       51.76        perf-profile.calltrace.cycles-pp.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto
>       0.00            +4.1        4.08        perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>       0.00            +4.1        4.10        perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>       0.00            +4.1        4.10        perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
>       0.00            +4.1        4.14        perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
>       0.00            +4.3        4.35   2%  perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
>      46.52            +4.8       51.27        perf-profile.calltrace.cycles-pp.sock_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      46.04            +5.0       51.08        perf-profile.calltrace.cycles-pp.udp_sendmsg.sock_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64
>       3.67            +8.0       11.63        perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg
>       3.71            +8.1       11.80        perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg
>       3.96            +8.5       12.42        perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg.sock_recvmsg
>       3.96            +8.5       12.44        perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom
>      35.13           +11.3       46.39        perf-profile.calltrace.cycles-pp.ip_make_skb.udp_sendmsg.sock_sendmsg.__sys_sendto.__x64_sys_sendto
>      32.68   2%     +13.0       45.65        perf-profile.calltrace.cycles-pp.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg.__sys_sendto
>      10.27           +20.3       30.59        perf-profile.calltrace.cycles-pp.sk_page_frag_refill.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg
>      10.24           +20.3       30.58        perf-profile.calltrace.cycles-pp.skb_page_frag_refill.sk_page_frag_refill.__ip_append_data.ip_make_skb.udp_sendmsg
>       9.84           +20.5       30.32        perf-profile.calltrace.cycles-pp.__alloc_pages.skb_page_frag_refill.sk_page_frag_refill.__ip_append_data.ip_make_skb
>       9.59           +20.5       30.11        perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.skb_page_frag_refill.sk_page_frag_refill.__ip_append_data
>       8.40           +21.0       29.42        perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.skb_page_frag_refill.sk_page_frag_refill
>       6.13           +21.9       28.05        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.rmqueue_bulk.rmqueue.get_page_from_freelist
>       6.20           +22.0       28.15        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages
>       6.46           +22.5       28.91        perf-profile.calltrace.cycles-pp.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages.skb_page_frag_refill
>      48.24           -21.8       26.43        perf-profile.children.cycles-pp.skb_release_data
>      47.19           -21.2       25.98        perf-profile.children.cycles-pp.free_unref_page
>      44.48           -19.6       24.88        perf-profile.children.cycles-pp.free_pcppages_bulk
>      45.62           -19.2       26.40        perf-profile.children.cycles-pp.__consume_stateless_skb
>      99.95           -14.2       85.76  15%  perf-profile.children.cycles-pp.main
>      99.96           -14.2       85.77  15%  perf-profile.children.cycles-pp.__libc_start_main
>      50.10            -8.6       41.53        perf-profile.children.cycles-pp.udp_recvmsg
>      50.11            -8.6       41.56        perf-profile.children.cycles-pp.inet_recvmsg
>      50.13            -8.5       41.65        perf-profile.children.cycles-pp.sock_recvmsg
>      50.29            -8.0       42.28        perf-profile.children.cycles-pp.__sys_recvfrom
>      50.29            -8.0       42.30        perf-profile.children.cycles-pp.__x64_sys_recvfrom
>      50.38            -7.5       42.86        perf-profile.children.cycles-pp.recvfrom
>      50.41            -7.3       43.07        perf-profile.children.cycles-pp.accept_connections
>      50.41            -7.3       43.07        perf-profile.children.cycles-pp.accept_connection
>      50.41            -7.3       43.07        perf-profile.children.cycles-pp.spawn_child
>      50.41            -7.3       43.07        perf-profile.children.cycles-pp.process_requests
>      50.41            -7.3       43.07        perf-profile.children.cycles-pp.recv_omni
>      19.96   2%      -6.5       13.50        perf-profile.children.cycles-pp.ip_generic_getfrag
>      19.46   2%      -6.2       13.28        perf-profile.children.cycles-pp._copy_from_iter
>      19.21   2%      -6.1       13.14        perf-profile.children.cycles-pp.copyin
>       8.96            -5.1        3.86        perf-profile.children.cycles-pp.udp_send_skb
>       8.72            -5.0        3.75        perf-profile.children.cycles-pp.ip_send_skb
>       8.11            -4.6        3.49        perf-profile.children.cycles-pp.ip_finish_output2
>       7.72            -4.4        3.32        perf-profile.children.cycles-pp.__dev_queue_xmit
>      98.71            -4.1       94.59        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      98.51            -4.0       94.46        perf-profile.children.cycles-pp.do_syscall_64
>       6.49            -3.7        2.78        perf-profile.children.cycles-pp.do_softirq
>       6.51            -3.7        2.82        perf-profile.children.cycles-pp.__local_bh_enable_ip
>       6.43            -3.7        2.78        perf-profile.children.cycles-pp.__do_softirq
>       5.90            -3.4        2.46        perf-profile.children.cycles-pp.net_rx_action
>       5.74            -3.4        2.38        perf-profile.children.cycles-pp.__napi_poll
>       5.71            -3.4        2.36        perf-profile.children.cycles-pp.process_backlog
>       5.37            -3.2        2.21        perf-profile.children.cycles-pp.__netif_receive_skb_one_core
>       4.60            -2.7        1.91        perf-profile.children.cycles-pp.ip_local_deliver_finish
>       4.57   2%      -2.7        1.90        perf-profile.children.cycles-pp.ip_protocol_deliver_rcu
>       4.42   2%      -2.6        1.83        perf-profile.children.cycles-pp.__udp4_lib_rcv
>       3.82   2%      -2.2        1.58   2%  perf-profile.children.cycles-pp.udp_unicast_rcv_skb
>       3.78   2%      -2.2        1.57   2%  perf-profile.children.cycles-pp.udp_queue_rcv_one_skb
>       2.23   2%      -1.6        0.65   2%  perf-profile.children.cycles-pp.__ip_make_skb
>       1.95   2%      -1.4        0.52   3%  perf-profile.children.cycles-pp.__ip_select_ident
>       1.51   4%      -1.2        0.34        perf-profile.children.cycles-pp.free_unref_page_commit
>       1.17            -0.7        0.51   2%  perf-profile.children.cycles-pp.ip_route_output_flow
>       1.15            -0.6        0.52        perf-profile.children.cycles-pp.sock_alloc_send_pskb
>       0.91            -0.5        0.39        perf-profile.children.cycles-pp.alloc_skb_with_frags
>       0.86            -0.5        0.37        perf-profile.children.cycles-pp.__alloc_skb
>       0.83            -0.5        0.36   2%  perf-profile.children.cycles-pp.ip_route_output_key_hash_rcu
>       0.75            -0.4        0.32        perf-profile.children.cycles-pp.dev_hard_start_xmit
>       0.72            -0.4        0.31   3%  perf-profile.children.cycles-pp.fib_table_lookup
>       0.67            -0.4        0.28        perf-profile.children.cycles-pp.loopback_xmit
>       0.70   2%      -0.4        0.33        perf-profile.children.cycles-pp.__zone_watermark_ok
>       0.47   4%      -0.3        0.15        perf-profile.children.cycles-pp.kmem_cache_free
>       0.57            -0.3        0.26        perf-profile.children.cycles-pp.kmem_cache_alloc_node
>       0.46            -0.3        0.18   2%  perf-profile.children.cycles-pp.ip_rcv
>       0.42            -0.3        0.17        perf-profile.children.cycles-pp.move_addr_to_kernel
>       0.41            -0.2        0.16   2%  perf-profile.children.cycles-pp.__udp4_lib_lookup
>       0.32            -0.2        0.13        perf-profile.children.cycles-pp.__netif_rx
>       0.30            -0.2        0.12        perf-profile.children.cycles-pp.netif_rx_internal
>       0.30            -0.2        0.12        perf-profile.children.cycles-pp._copy_from_user
>       0.31            -0.2        0.13        perf-profile.children.cycles-pp.kmalloc_reserve
>       0.63            -0.2        0.46   2%  perf-profile.children.cycles-pp.free_unref_page_prepare
>       0.28            -0.2        0.11        perf-profile.children.cycles-pp.enqueue_to_backlog
>       0.27            -0.2        0.11        perf-profile.children.cycles-pp.udp4_lib_lookup2
>       0.29            -0.2        0.13   6%  perf-profile.children.cycles-pp.send_data
>       0.25            -0.2        0.10        perf-profile.children.cycles-pp.__netif_receive_skb_core
>       0.23   2%      -0.1        0.10   4%  perf-profile.children.cycles-pp.security_socket_sendmsg
>       0.19   2%      -0.1        0.06        perf-profile.children.cycles-pp.ip_rcv_core
>       0.37            -0.1        0.24        perf-profile.children.cycles-pp.irqtime_account_irq
>       0.21            -0.1        0.08        perf-profile.children.cycles-pp.sock_wfree
>       0.21   3%      -0.1        0.08        perf-profile.children.cycles-pp.validate_xmit_skb
>       0.20   2%      -0.1        0.08        perf-profile.children.cycles-pp.ip_output
>       0.22   2%      -0.1        0.10   4%  perf-profile.children.cycles-pp.ip_rcv_finish_core
>       0.20   6%      -0.1        0.09   5%  perf-profile.children.cycles-pp.__mkroute_output
>       0.21   2%      -0.1        0.09   5%  perf-profile.children.cycles-pp._raw_spin_lock_irq
>       0.28            -0.1        0.18        perf-profile.children.cycles-pp._raw_spin_trylock
>       0.34   3%      -0.1        0.25        perf-profile.children.cycles-pp.__slab_free
>       0.13   3%      -0.1        0.05        perf-profile.children.cycles-pp.siphash_3u32
>       0.12   4%      -0.1        0.03  70%  perf-profile.children.cycles-pp.ipv4_pktinfo_prepare
>       0.14   3%      -0.1        0.06   7%  perf-profile.children.cycles-pp.__ip_local_out
>       0.20   2%      -0.1        0.12        perf-profile.children.cycles-pp.aa_sk_perm
>       0.18   2%      -0.1        0.10        perf-profile.children.cycles-pp.get_pfnblock_flags_mask
>       0.12   3%      -0.1        0.05        perf-profile.children.cycles-pp.sk_filter_trim_cap
>       0.13            -0.1        0.06        perf-profile.children.cycles-pp.ip_setup_cork
>       0.13   7%      -0.1        0.06   8%  perf-profile.children.cycles-pp.fib_lookup_good_nhc
>       0.15   3%      -0.1        0.08   5%  perf-profile.children.cycles-pp.skb_set_owner_w
>       0.11   4%      -0.1        0.05        perf-profile.children.cycles-pp.dst_release
>       0.23   2%      -0.1        0.17   2%  perf-profile.children.cycles-pp.__entry_text_start
>       0.11            -0.1        0.05        perf-profile.children.cycles-pp.ipv4_mtu
>       0.20   2%      -0.1        0.15   3%  perf-profile.children.cycles-pp.__list_add_valid_or_report
>       0.10            -0.1        0.05        perf-profile.children.cycles-pp.ip_send_check
>       0.31   2%      -0.0        0.26   3%  perf-profile.children.cycles-pp.sockfd_lookup_light
>       0.27            -0.0        0.22   2%  perf-profile.children.cycles-pp.__fget_light
>       0.63            -0.0        0.58        perf-profile.children.cycles-pp.__check_object_size
>       0.15   3%      -0.0        0.11        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.13            -0.0        0.09   5%  perf-profile.children.cycles-pp.alloc_pages
>       0.27            -0.0        0.24        perf-profile.children.cycles-pp.sched_clock_cpu
>       0.11   4%      -0.0        0.08   6%  perf-profile.children.cycles-pp.__cond_resched
>       0.14   3%      -0.0        0.11        perf-profile.children.cycles-pp.free_tail_page_prepare
>       0.11            -0.0        0.08   5%  perf-profile.children.cycles-pp.syscall_return_via_sysret
>       0.09   9%      -0.0        0.06   7%  perf-profile.children.cycles-pp.__xfrm_policy_check2
>       0.23   2%      -0.0        0.21   2%  perf-profile.children.cycles-pp.sched_clock
>       0.14   3%      -0.0        0.11   4%  perf-profile.children.cycles-pp.prep_compound_page
>       0.21   2%      -0.0        0.20   2%  perf-profile.children.cycles-pp.native_sched_clock
>       0.06            -0.0        0.05        perf-profile.children.cycles-pp.task_tick_fair
>       0.06            -0.0        0.05        perf-profile.children.cycles-pp.check_stack_object
>       0.18   2%      +0.0        0.20   2%  perf-profile.children.cycles-pp.perf_event_task_tick
>       0.18   2%      +0.0        0.19   2%  perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
>       0.31   3%      +0.0        0.33        perf-profile.children.cycles-pp.tick_sched_handle
>       0.31   3%      +0.0        0.33        perf-profile.children.cycles-pp.update_process_times
>       0.41   2%      +0.0        0.43        perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
>       0.40   2%      +0.0        0.42        perf-profile.children.cycles-pp.hrtimer_interrupt
>       0.32   2%      +0.0        0.34        perf-profile.children.cycles-pp.tick_sched_timer
>       0.36   2%      +0.0        0.39        perf-profile.children.cycles-pp.__hrtimer_run_queues
>       0.06   7%      +0.0        0.10   4%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
>       0.05   8%      +0.0        0.10        perf-profile.children.cycles-pp._raw_spin_lock_bh
>       0.00            +0.1        0.05        perf-profile.children.cycles-pp.update_cfs_group
>       0.00            +0.1        0.05        perf-profile.children.cycles-pp.cpuidle_governor_latency_req
>       0.00            +0.1        0.05        perf-profile.children.cycles-pp.flush_smp_call_function_queue
>       0.00            +0.1        0.05   8%  perf-profile.children.cycles-pp.prepare_to_wait_exclusive
>       0.07            +0.1        0.13   3%  perf-profile.children.cycles-pp.__mod_zone_page_state
>       0.00            +0.1        0.06  13%  perf-profile.children.cycles-pp.cgroup_rstat_updated
>       0.00            +0.1        0.06        perf-profile.children.cycles-pp.__x2apic_send_IPI_dest
>       0.00            +0.1        0.06        perf-profile.children.cycles-pp.security_socket_recvmsg
>       0.00            +0.1        0.06        perf-profile.children.cycles-pp.select_task_rq_fair
>       0.00            +0.1        0.06        perf-profile.children.cycles-pp.tick_irq_enter
>       0.00            +0.1        0.06        perf-profile.children.cycles-pp.tick_nohz_idle_enter
>       0.42   2%      +0.1        0.49   2%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
>       0.00            +0.1        0.07   7%  perf-profile.children.cycles-pp.ktime_get
>       0.00            +0.1        0.07        perf-profile.children.cycles-pp.__get_user_4
>       0.00            +0.1        0.07        perf-profile.children.cycles-pp.update_rq_clock
>       0.00            +0.1        0.07        perf-profile.children.cycles-pp.select_task_rq
>       0.00            +0.1        0.07        perf-profile.children.cycles-pp.native_apic_msr_eoi
>       0.49            +0.1        0.57   2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>       0.11  11%      +0.1        0.19   2%  perf-profile.children.cycles-pp._raw_spin_lock
>       0.00            +0.1        0.08        perf-profile.children.cycles-pp.update_rq_clock_task
>       0.00            +0.1        0.08        perf-profile.children.cycles-pp.__update_load_avg_se
>       0.00            +0.1        0.09   5%  perf-profile.children.cycles-pp.irq_enter_rcu
>       0.00            +0.1        0.09   5%  perf-profile.children.cycles-pp.__irq_exit_rcu
>       0.00            +0.1        0.09        perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
>       0.00            +0.1        0.09        perf-profile.children.cycles-pp.update_blocked_averages
>       0.00            +0.1        0.09        perf-profile.children.cycles-pp.update_sg_lb_stats
>       0.00            +0.1        0.09   5%  perf-profile.children.cycles-pp.set_next_entity
>       0.00            +0.1        0.10        perf-profile.children.cycles-pp.__switch_to_asm
>       0.00            +0.1        0.11  12%  perf-profile.children.cycles-pp._copy_to_user
>       0.00            +0.1        0.12   3%  perf-profile.children.cycles-pp.menu_select
>       0.00            +0.1        0.12   3%  perf-profile.children.cycles-pp.recv_data
>       0.00            +0.1        0.12   3%  perf-profile.children.cycles-pp.update_sd_lb_stats
>       0.00            +0.1        0.13   3%  perf-profile.children.cycles-pp.native_irq_return_iret
>       0.00            +0.1        0.13   3%  perf-profile.children.cycles-pp.__switch_to
>       0.00            +0.1        0.13   3%  perf-profile.children.cycles-pp.find_busiest_group
>       0.00            +0.1        0.14        perf-profile.children.cycles-pp.finish_task_switch
>       0.00            +0.1        0.15   3%  perf-profile.children.cycles-pp.update_curr
>       0.00            +0.2        0.15   3%  perf-profile.children.cycles-pp.mem_cgroup_uncharge_skmem
>       0.00            +0.2        0.16        perf-profile.children.cycles-pp.ttwu_queue_wakelist
>       0.05            +0.2        0.22   2%  perf-profile.children.cycles-pp.page_counter_try_charge
>       0.00            +0.2        0.17   2%  perf-profile.children.cycles-pp.load_balance
>       0.00            +0.2        0.17   2%  perf-profile.children.cycles-pp.___perf_sw_event
>       0.02 141%      +0.2        0.19   2%  perf-profile.children.cycles-pp.page_counter_uncharge
>       0.33            +0.2        0.52        perf-profile.children.cycles-pp.__free_one_page
>       0.02 141%      +0.2        0.21   2%  perf-profile.children.cycles-pp.drain_stock
>       0.00            +0.2        0.20   2%  perf-profile.children.cycles-pp.prepare_task_switch
>       0.16   3%      +0.2        0.38   2%  perf-profile.children.cycles-pp.simple_copy_to_iter
>       0.07  11%      +0.2        0.31        perf-profile.children.cycles-pp.refill_stock
>       0.07   6%      +0.2        0.31   4%  perf-profile.children.cycles-pp.move_addr_to_user
>       0.00            +0.2        0.24        perf-profile.children.cycles-pp.enqueue_entity
>       0.00            +0.2        0.25        perf-profile.children.cycles-pp.update_load_avg
>       0.21   2%      +0.3        0.48        perf-profile.children.cycles-pp.__list_del_entry_valid_or_report
>       0.00            +0.3        0.31   4%  perf-profile.children.cycles-pp.dequeue_entity
>       0.08   5%      +0.3        0.40   3%  perf-profile.children.cycles-pp.try_charge_memcg
>       0.00            +0.3        0.33        perf-profile.children.cycles-pp.enqueue_task_fair
>       0.00            +0.4        0.35   2%  perf-profile.children.cycles-pp.dequeue_task_fair
>       0.00            +0.4        0.35   2%  perf-profile.children.cycles-pp.activate_task
>       0.00            +0.4        0.36   2%  perf-profile.children.cycles-pp.try_to_wake_up
>       0.00            +0.4        0.37   2%  perf-profile.children.cycles-pp.autoremove_wake_function
>       0.00            +0.4        0.39   3%  perf-profile.children.cycles-pp.newidle_balance
>       0.12   8%      +0.4        0.51   2%  perf-profile.children.cycles-pp.mem_cgroup_charge_skmem
>       0.00            +0.4        0.39        perf-profile.children.cycles-pp.ttwu_do_activate
>       0.00            +0.4        0.40   2%  perf-profile.children.cycles-pp.__wake_up_common
>       0.18   4%      +0.4        0.59        perf-profile.children.cycles-pp.udp_rmem_release
>       0.11   7%      +0.4        0.52        perf-profile.children.cycles-pp.__sk_mem_reduce_allocated
>       0.00            +0.4        0.43        perf-profile.children.cycles-pp.__wake_up_common_lock
>       0.00            +0.5        0.46        perf-profile.children.cycles-pp.sched_ttwu_pending
>       0.00            +0.5        0.49        perf-profile.children.cycles-pp.sock_def_readable
>       0.00            +0.5        0.53   2%  perf-profile.children.cycles-pp.pick_next_task_fair
>       0.00            +0.5        0.54   2%  perf-profile.children.cycles-pp.schedule_idle
>       0.00            +0.6        0.55        perf-profile.children.cycles-pp.__flush_smp_call_function_queue
>       0.15   3%      +0.6        0.73   2%  perf-profile.children.cycles-pp.__sk_mem_raise_allocated
>       0.00            +0.6        0.57        perf-profile.children.cycles-pp.__sysvec_call_function_single
>       0.16   5%      +0.6        0.74   2%  perf-profile.children.cycles-pp.__sk_mem_schedule
>       0.00            +0.8        0.78        perf-profile.children.cycles-pp.sysvec_call_function_single
>       0.41   3%      +0.9        1.33   2%  perf-profile.children.cycles-pp.__udp_enqueue_schedule_skb
>       0.00            +1.2        1.16   2%  perf-profile.children.cycles-pp.schedule
>       0.00            +1.2        1.21   2%  perf-profile.children.cycles-pp.schedule_timeout
>       0.00            +1.3        1.33   2%  perf-profile.children.cycles-pp.__skb_wait_for_more_packets
>       0.00            +1.7        1.66   2%  perf-profile.children.cycles-pp.__schedule
>       0.27   3%      +2.0        2.25        perf-profile.children.cycles-pp.__skb_recv_udp
>      50.41            +2.4       52.81        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>       0.00            +2.7        2.68        perf-profile.children.cycles-pp.asm_sysvec_call_function_single
>      49.78            +2.7       52.49        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>       0.00            +3.0        2.98        perf-profile.children.cycles-pp.acpi_safe_halt
>       0.00            +3.0        3.00        perf-profile.children.cycles-pp.acpi_idle_enter
>      49.51            +3.1       52.57        perf-profile.children.cycles-pp.send_udp_stream
>      49.50            +3.1       52.56        perf-profile.children.cycles-pp.send_omni_inner
>       0.00            +3.1        3.10        perf-profile.children.cycles-pp.cpuidle_enter_state
>       0.00            +3.1        3.12        perf-profile.children.cycles-pp.cpuidle_enter
>       0.00            +3.4        3.37        perf-profile.children.cycles-pp.cpuidle_idle_call
>      48.90            +3.4       52.30        perf-profile.children.cycles-pp.sendto
>      47.85            +4.0       51.83        perf-profile.children.cycles-pp.__x64_sys_sendto
>      47.73            +4.0       51.77        perf-profile.children.cycles-pp.__sys_sendto
>       0.00            +4.1        4.10        perf-profile.children.cycles-pp.start_secondary
>       0.00            +4.1        4.13        perf-profile.children.cycles-pp.do_idle
>       0.00            +4.1        4.14        perf-profile.children.cycles-pp.secondary_startup_64_no_verify
>       0.00            +4.1        4.14        perf-profile.children.cycles-pp.cpu_startup_entry
>      46.54            +4.7       51.28        perf-profile.children.cycles-pp.sock_sendmsg
>      46.10            +5.0       51.11        perf-profile.children.cycles-pp.udp_sendmsg
>       3.70            +8.0       11.71        perf-profile.children.cycles-pp.copyout
>       3.71            +8.1       11.80        perf-profile.children.cycles-pp._copy_to_iter
>       3.96            +8.5       12.43        perf-profile.children.cycles-pp.__skb_datagram_iter
>       3.96            +8.5       12.44        perf-profile.children.cycles-pp.skb_copy_datagram_iter
>      35.14           +11.3       46.40        perf-profile.children.cycles-pp.ip_make_skb
>      32.71   2%     +13.0       45.66        perf-profile.children.cycles-pp.__ip_append_data
>      10.28           +20.6       30.89        perf-profile.children.cycles-pp.sk_page_frag_refill
>      10.25           +20.6       30.88        perf-profile.children.cycles-pp.skb_page_frag_refill
>       9.86           +20.8       30.63        perf-profile.children.cycles-pp.__alloc_pages
>       9.62           +20.8       30.42        perf-profile.children.cycles-pp.get_page_from_freelist
>       8.42           +21.3       29.72        perf-profile.children.cycles-pp.rmqueue
>       6.47           +22.8       29.22        perf-profile.children.cycles-pp.rmqueue_bulk
>      19.11   2%      -6.0       13.08        perf-profile.self.cycles-pp.copyin
>       1.81   2%      -1.4        0.39        perf-profile.self.cycles-pp.rmqueue
>       1.81   2%      -1.3        0.46   2%  perf-profile.self.cycles-pp.__ip_select_ident
>       1.47   4%      -1.2        0.31        perf-profile.self.cycles-pp.free_unref_page_commit
>       1.29   2%      -0.5        0.75        perf-profile.self.cycles-pp.__ip_append_data
>       0.71            -0.4        0.29        perf-profile.self.cycles-pp.udp_sendmsg
>       0.68   2%      -0.4        0.32        perf-profile.self.cycles-pp.__zone_watermark_ok
>       0.50            -0.3        0.16        perf-profile.self.cycles-pp.skb_release_data
>       0.59   3%      -0.3        0.26   3%  perf-profile.self.cycles-pp.fib_table_lookup
>       0.46   4%      -0.3        0.15   3%  perf-profile.self.cycles-pp.kmem_cache_free
>       0.63            -0.3        0.33   2%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>       0.47            -0.3        0.19        perf-profile.self.cycles-pp.__sys_sendto
>       0.44            -0.2        0.21   2%  perf-profile.self.cycles-pp.kmem_cache_alloc_node
>       0.36            -0.2        0.16   3%  perf-profile.self.cycles-pp.send_omni_inner
>       0.35   2%      -0.2        0.15   3%  perf-profile.self.cycles-pp.ip_finish_output2
>       0.29            -0.2        0.12        perf-profile.self.cycles-pp._copy_from_user
>       0.24            -0.1        0.10   4%  perf-profile.self.cycles-pp.__netif_receive_skb_core
>       0.22   2%      -0.1        0.08   5%  perf-profile.self.cycles-pp.free_unref_page
>       0.19   2%      -0.1        0.06        perf-profile.self.cycles-pp.ip_rcv_core
>       0.21   2%      -0.1        0.08        perf-profile.self.cycles-pp.__alloc_skb
>       0.20   2%      -0.1        0.08        perf-profile.self.cycles-pp.sock_wfree
>       0.22   2%      -0.1        0.10   4%  perf-profile.self.cycles-pp.send_data
>       0.21            -0.1        0.09        perf-profile.self.cycles-pp.sendto
>       0.21   2%      -0.1        0.10   4%  perf-profile.self.cycles-pp.ip_rcv_finish_core
>       0.21   2%      -0.1        0.09   5%  perf-profile.self.cycles-pp.__ip_make_skb
>       0.20   4%      -0.1        0.09   5%  perf-profile.self.cycles-pp._raw_spin_lock_irq
>       0.21   2%      -0.1        0.10   4%  perf-profile.self.cycles-pp.__dev_queue_xmit
>       0.38   3%      -0.1        0.27        perf-profile.self.cycles-pp.get_page_from_freelist
>       0.20   2%      -0.1        0.09        perf-profile.self.cycles-pp.udp_send_skb
>       0.18   2%      -0.1        0.07        perf-profile.self.cycles-pp.__udp_enqueue_schedule_skb
>       0.18   4%      -0.1        0.08   6%  perf-profile.self.cycles-pp.__mkroute_output
>       0.25            -0.1        0.15   3%  perf-profile.self.cycles-pp._copy_from_iter
>       0.27   4%      -0.1        0.17   2%  perf-profile.self.cycles-pp.skb_page_frag_refill
>       0.16            -0.1        0.06   7%  perf-profile.self.cycles-pp.sock_sendmsg
>       0.33   2%      -0.1        0.24        perf-profile.self.cycles-pp.__slab_free
>       0.15   3%      -0.1        0.06        perf-profile.self.cycles-pp.udp4_lib_lookup2
>       0.38   2%      -0.1        0.29   2%  perf-profile.self.cycles-pp.free_unref_page_prepare
>       0.26            -0.1        0.17        perf-profile.self.cycles-pp._raw_spin_trylock
>       0.15            -0.1        0.06        perf-profile.self.cycles-pp.ip_output
>       0.14            -0.1        0.05   8%  perf-profile.self.cycles-pp.process_backlog
>       0.14            -0.1        0.06        perf-profile.self.cycles-pp.ip_route_output_flow
>       0.14            -0.1        0.06        perf-profile.self.cycles-pp.__udp4_lib_lookup
>       0.21   2%      -0.1        0.13   3%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
>       0.12   3%      -0.1        0.05        perf-profile.self.cycles-pp.siphash_3u32
>       0.13   3%      -0.1        0.06   8%  perf-profile.self.cycles-pp.ip_send_skb
>       0.17            -0.1        0.10        perf-profile.self.cycles-pp.__do_softirq
>       0.15   3%      -0.1        0.08   5%  perf-profile.self.cycles-pp.skb_set_owner_w
>       0.17   2%      -0.1        0.10   4%  perf-profile.self.cycles-pp.aa_sk_perm
>       0.12            -0.1        0.05        perf-profile.self.cycles-pp.__x64_sys_sendto
>       0.12   6%      -0.1        0.05        perf-profile.self.cycles-pp.fib_lookup_good_nhc
>       0.19   2%      -0.1        0.13        perf-profile.self.cycles-pp.__list_add_valid_or_report
>       0.14   3%      -0.1        0.07   6%  perf-profile.self.cycles-pp.net_rx_action
>       0.16   2%      -0.1        0.10        perf-profile.self.cycles-pp.do_syscall_64
>       0.11            -0.1        0.05        perf-profile.self.cycles-pp.__udp4_lib_rcv
>       0.16   3%      -0.1        0.10   4%  perf-profile.self.cycles-pp.get_pfnblock_flags_mask
>       0.11   4%      -0.1        0.05        perf-profile.self.cycles-pp.ip_route_output_key_hash_rcu
>       0.10   4%      -0.1        0.05        perf-profile.self.cycles-pp.ip_generic_getfrag
>       0.10            -0.1        0.05        perf-profile.self.cycles-pp.ipv4_mtu
>       0.26            -0.0        0.21   2%  perf-profile.self.cycles-pp.__fget_light
>       0.15   3%      -0.0        0.11   4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.24            -0.0        0.20   2%  perf-profile.self.cycles-pp.__alloc_pages
>       0.15   3%      -0.0        0.12        perf-profile.self.cycles-pp.__check_object_size
>       0.11            -0.0        0.08   6%  perf-profile.self.cycles-pp.syscall_return_via_sysret
>       0.08   5%      -0.0        0.05        perf-profile.self.cycles-pp.loopback_xmit
>       0.13            -0.0        0.11   4%  perf-profile.self.cycles-pp.prep_compound_page
>       0.11            -0.0        0.09   5%  perf-profile.self.cycles-pp.irqtime_account_irq
>       0.09  10%      -0.0        0.06   7%  perf-profile.self.cycles-pp.__xfrm_policy_check2
>       0.07            -0.0        0.05        perf-profile.self.cycles-pp.alloc_pages
>       0.08            -0.0        0.06   7%  perf-profile.self.cycles-pp.__entry_text_start
>       0.09   5%      -0.0        0.07        perf-profile.self.cycles-pp.free_tail_page_prepare
>       0.10            +0.0        0.11        perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
>       0.06            +0.0        0.08   6%  perf-profile.self.cycles-pp.free_pcppages_bulk
>       0.05   8%      +0.0        0.10   4%  perf-profile.self.cycles-pp._raw_spin_lock_bh
>       0.07            +0.0        0.12        perf-profile.self.cycles-pp.__mod_zone_page_state
>       0.00            +0.1        0.05        perf-profile.self.cycles-pp.cpuidle_idle_call
>       0.00            +0.1        0.05        perf-profile.self.cycles-pp.udp_rmem_release
>       0.00            +0.1        0.05        perf-profile.self.cycles-pp.__flush_smp_call_function_queue
>       0.00            +0.1        0.05        perf-profile.self.cycles-pp.sock_def_readable
>       0.00            +0.1        0.05        perf-profile.self.cycles-pp.update_cfs_group
>       0.11  11%      +0.1        0.17   2%  perf-profile.self.cycles-pp._raw_spin_lock
>       0.00            +0.1        0.05   8%  perf-profile.self.cycles-pp.finish_task_switch
>       0.00            +0.1        0.05   8%  perf-profile.self.cycles-pp.cgroup_rstat_updated
>       0.00            +0.1        0.06        perf-profile.self.cycles-pp.do_idle
>       0.00            +0.1        0.06        perf-profile.self.cycles-pp.__skb_wait_for_more_packets
>       0.00            +0.1        0.06        perf-profile.self.cycles-pp.__x2apic_send_IPI_dest
>       0.00            +0.1        0.06   7%  perf-profile.self.cycles-pp.enqueue_entity
>       0.00            +0.1        0.07   7%  perf-profile.self.cycles-pp.schedule_timeout
>       0.00            +0.1        0.07   7%  perf-profile.self.cycles-pp.move_addr_to_user
>       0.00            +0.1        0.07   7%  perf-profile.self.cycles-pp.menu_select
>       0.00            +0.1        0.07   7%  perf-profile.self.cycles-pp.native_apic_msr_eoi
>       0.00            +0.1        0.07   7%  perf-profile.self.cycles-pp.update_sg_lb_stats
>       0.00            +0.1        0.07        perf-profile.self.cycles-pp.__update_load_avg_se
>       0.00            +0.1        0.07        perf-profile.self.cycles-pp.__get_user_4
>       0.00            +0.1        0.08   6%  perf-profile.self.cycles-pp.__sk_mem_reduce_allocated
>       0.00            +0.1        0.08        perf-profile.self.cycles-pp.update_curr
>       0.00            +0.1        0.08   5%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
>       0.00            +0.1        0.09   5%  perf-profile.self.cycles-pp.try_to_wake_up
>       0.00            +0.1        0.09        perf-profile.self.cycles-pp.recvfrom
>       0.00            +0.1        0.09        perf-profile.self.cycles-pp.mem_cgroup_charge_skmem
>       0.00            +0.1        0.09        perf-profile.self.cycles-pp.update_load_avg
>       0.00            +0.1        0.09   5%  perf-profile.self.cycles-pp.enqueue_task_fair
>       0.00            +0.1        0.10   4%  perf-profile.self.cycles-pp._copy_to_iter
>       0.00            +0.1        0.10   4%  perf-profile.self.cycles-pp.newidle_balance
>       0.00            +0.1        0.10   4%  perf-profile.self.cycles-pp.recv_data
>       0.00            +0.1        0.10        perf-profile.self.cycles-pp.refill_stock
>       0.00            +0.1        0.10        perf-profile.self.cycles-pp.__switch_to_asm
>       0.00            +0.1        0.11  15%  perf-profile.self.cycles-pp._copy_to_user
>       0.00            +0.1        0.12        perf-profile.self.cycles-pp.recv_omni
>       0.00            +0.1        0.12        perf-profile.self.cycles-pp.mem_cgroup_uncharge_skmem
>       0.00            +0.1        0.13   3%  perf-profile.self.cycles-pp.native_irq_return_iret
>       0.00            +0.1        0.13        perf-profile.self.cycles-pp.__switch_to
>       0.06            +0.1        0.20   2%  perf-profile.self.cycles-pp.rmqueue_bulk
>       0.09   5%      +0.1        0.23   4%  perf-profile.self.cycles-pp.udp_recvmsg
>       0.00            +0.1        0.14   3%  perf-profile.self.cycles-pp.__skb_recv_udp
>       0.00            +0.1        0.14   3%  perf-profile.self.cycles-pp.___perf_sw_event
>       0.08            +0.1        0.22   2%  perf-profile.self.cycles-pp.__skb_datagram_iter
>       0.03  70%      +0.2        0.20   4%  perf-profile.self.cycles-pp.page_counter_try_charge
>       0.02 141%      +0.2        0.18   4%  perf-profile.self.cycles-pp.__sys_recvfrom
>       0.00            +0.2        0.17   2%  perf-profile.self.cycles-pp.__schedule
>       0.00            +0.2        0.17   2%  perf-profile.self.cycles-pp.try_charge_memcg
>       0.00            +0.2        0.17   2%  perf-profile.self.cycles-pp.page_counter_uncharge
>       0.00            +0.2        0.21   2%  perf-profile.self.cycles-pp.__sk_mem_raise_allocated
>       0.14   3%      +0.2        0.36        perf-profile.self.cycles-pp.__free_one_page
>       0.20   2%      +0.3        0.47        perf-profile.self.cycles-pp.__list_del_entry_valid_or_report
>       0.00            +2.1        2.07   2%  perf-profile.self.cycles-pp.acpi_safe_halt
>      49.78            +2.7       52.49        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>       3.68            +8.0       11.64        perf-profile.self.cycles-pp.copyout
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
diff mbox series

Patch

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 585c66fce9d9..f1e79263fe61 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -950,6 +950,7 @@  static int cacheinfo_cpu_online(unsigned int cpu)
 	if (rc)
 		goto err;
 	update_per_cpu_data_slice_size(true, cpu);
+	setup_pcp_cacheinfo();
 	return 0;
 err:
 	free_cache_attributes(cpu);
@@ -963,6 +964,7 @@  static int cacheinfo_cpu_pre_down(unsigned int cpu)
 
 	free_cache_attributes(cpu);
 	update_per_cpu_data_slice_size(false, cpu);
+	setup_pcp_cacheinfo();
 	return 0;
 }
 
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 665f06675c83..665edc11fb9f 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -325,6 +325,7 @@  void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);
 
 void page_alloc_init_late(void);
+void setup_pcp_cacheinfo(void);
 
 /*
  * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 19c40a6f7e45..cdff247e8c6f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -682,8 +682,14 @@  enum zone_watermarks {
  * PCPF_PREV_FREE_HIGH_ORDER: a high-order page is freed in the
  * previous page freeing.  To avoid to drain PCP for an accident
  * high-order page freeing.
+ *
+ * PCPF_FREE_HIGH_BATCH: preserve "pcp->batch" pages in PCP before
+ * draining PCP for consecutive high-order pages freeing without
+ * allocation if data cache slice of CPU is large enough.  To reduce
+ * zone lock contention and keep cache-hot pages reusing.
  */
 #define	PCPF_PREV_FREE_HIGH_ORDER	BIT(0)
+#define	PCPF_FREE_HIGH_BATCH		BIT(1)
 
 struct per_cpu_pages {
 	spinlock_t lock;	/* Protects lists field */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 295e61f0c49d..ba2d8f06523e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -52,6 +52,7 @@ 
 #include <linux/psi.h>
 #include <linux/khugepaged.h>
 #include <linux/delayacct.h>
+#include <linux/cacheinfo.h>
 #include <asm/div64.h>
 #include "internal.h"
 #include "shuffle.h"
@@ -2385,7 +2386,9 @@  static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp,
 	 */
 	if (order && order <= PAGE_ALLOC_COSTLY_ORDER) {
 		free_high = (pcp->free_factor &&
-			     (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER));
+			     (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) &&
+			     (!(pcp->flags & PCPF_FREE_HIGH_BATCH) ||
+			      pcp->count >= READ_ONCE(pcp->batch)));
 		pcp->flags |= PCPF_PREV_FREE_HIGH_ORDER;
 	} else if (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) {
 		pcp->flags &= ~PCPF_PREV_FREE_HIGH_ORDER;
@@ -5418,6 +5421,39 @@  static void zone_pcp_update(struct zone *zone, int cpu_online)
 	mutex_unlock(&pcp_batch_high_lock);
 }
 
+static void zone_pcp_update_cacheinfo(struct zone *zone)
+{
+	int cpu;
+	struct per_cpu_pages *pcp;
+	struct cpu_cacheinfo *cci;
+
+	for_each_online_cpu(cpu) {
+		pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
+		cci = get_cpu_cacheinfo(cpu);
+		/*
+		 * If data cache slice of CPU is large enough, "pcp->batch"
+		 * pages can be preserved in PCP before draining PCP for
+		 * consecutive high-order pages freeing without allocation.
+		 * This can reduce zone lock contention without hurting
+		 * cache-hot pages sharing.
+		 */
+		spin_lock(&pcp->lock);
+		if ((cci->per_cpu_data_slice_size >> PAGE_SHIFT) > 3 * pcp->batch)
+			pcp->flags |= PCPF_FREE_HIGH_BATCH;
+		else
+			pcp->flags &= ~PCPF_FREE_HIGH_BATCH;
+		spin_unlock(&pcp->lock);
+	}
+}
+
+void setup_pcp_cacheinfo(void)
+{
+	struct zone *zone;
+
+	for_each_populated_zone(zone)
+		zone_pcp_update_cacheinfo(zone);
+}
+
 /*
  * Allocate per cpu pagesets and initialize them.
  * Before this call only boot pagesets were available.