Message ID | 20201207091516.24683-1-mgorman@techsingularity.net (mailing list archive) |
---|---|
Headers | show |
Series | Reduce worst-case scanning of runqueues in select_idle_sibling | expand |
On Mon, 7 Dec 2020 at 10:15, Mel Gorman <mgorman@techsingularity.net> wrote: > > This is a minimal series to reduce the amount of runqueue scanning in > select_idle_sibling in the worst case. > > Patch 1 removes SIS_AVG_CPU because it's unused. > > Patch 2 improves the hit rate of p->recent_used_cpu to reduce the amount > of scanning. It should be relatively uncontroversial > > Patch 3-4 scans the runqueues in a single pass for select_idle_core() > and select_idle_cpu() so runqueues are not scanned twice. It's > a tradeoff because it benefits deep scans but introduces overhead > for shallow scans. > > Even if patch 3-4 is rejected to allow more time for Aubrey's idle cpu mask patch 3 looks fine and doesn't collide with Aubrey's work. But I don't like patch 4 which manipulates different cpumask including load_balance_mask out of LB and I prefer to wait for v6 of Aubrey's patchset which should fix the problem of possibly scanning twice busy cpus in select_idle_core and select_idle_cpu > approach to stand on its own, patches 1-2 should be fine. The main decision > with patch 4 is whether select_idle_core() should do a full scan when searching > for an idle core, whether it should be throttled in some other fashion or > whether it should be just left alone. > > -- > 2.26.2 >
On Mon, Dec 07, 2020 at 04:04:41PM +0100, Vincent Guittot wrote: > On Mon, 7 Dec 2020 at 10:15, Mel Gorman <mgorman@techsingularity.net> wrote: > > > > This is a minimal series to reduce the amount of runqueue scanning in > > select_idle_sibling in the worst case. > > > > Patch 1 removes SIS_AVG_CPU because it's unused. > > > > Patch 2 improves the hit rate of p->recent_used_cpu to reduce the amount > > of scanning. It should be relatively uncontroversial > > > > Patch 3-4 scans the runqueues in a single pass for select_idle_core() > > and select_idle_cpu() so runqueues are not scanned twice. It's > > a tradeoff because it benefits deep scans but introduces overhead > > for shallow scans. > > > > Even if patch 3-4 is rejected to allow more time for Aubrey's idle cpu mask > > patch 3 looks fine and doesn't collide with Aubrey's work. But I don't > like patch 4 which manipulates different cpumask including > load_balance_mask out of LB and I prefer to wait for v6 of Aubrey's > patchset which should fix the problem of possibly scanning twice busy > cpus in select_idle_core and select_idle_cpu > Seems fair, we can see where we stand after V6 of Aubrey's work. A lot of the motivation for patch 4 would go away if we managed to avoid calling select_idle_core() unnecessarily. As it stands, we can call it a lot from hackbench even though the chance of getting an idle core are minimal. Assuming I revisit it, I'll update the schedstat debug patches to include the times select_idle_core() starts versus how many times it fails and see can I think of a useful heuristic. I'll wait for more review on patches 1-3 and if I hear nothing, I'll resend just those. Thanks Vincent.
On 2020/12/7 23:42, Mel Gorman wrote: > On Mon, Dec 07, 2020 at 04:04:41PM +0100, Vincent Guittot wrote: >> On Mon, 7 Dec 2020 at 10:15, Mel Gorman <mgorman@techsingularity.net> wrote: >>> >>> This is a minimal series to reduce the amount of runqueue scanning in >>> select_idle_sibling in the worst case. >>> >>> Patch 1 removes SIS_AVG_CPU because it's unused. >>> >>> Patch 2 improves the hit rate of p->recent_used_cpu to reduce the amount >>> of scanning. It should be relatively uncontroversial >>> >>> Patch 3-4 scans the runqueues in a single pass for select_idle_core() >>> and select_idle_cpu() so runqueues are not scanned twice. It's >>> a tradeoff because it benefits deep scans but introduces overhead >>> for shallow scans. >>> >>> Even if patch 3-4 is rejected to allow more time for Aubrey's idle cpu mask >> >> patch 3 looks fine and doesn't collide with Aubrey's work. But I don't >> like patch 4 which manipulates different cpumask including >> load_balance_mask out of LB and I prefer to wait for v6 of Aubrey's >> patchset which should fix the problem of possibly scanning twice busy >> cpus in select_idle_core and select_idle_cpu >> > > Seems fair, we can see where we stand after V6 of Aubrey's work. A lot > of the motivation for patch 4 would go away if we managed to avoid calling > select_idle_core() unnecessarily. As it stands, we can call it a lot from > hackbench even though the chance of getting an idle core are minimal. > Sorry for the delay, I sent v6 out just now. Comparing to v5, v6 followed Vincent's suggestion to decouple idle cpumask update from stop_tick signal, that is, the CPU is set in idle cpumask every time the CPU enters idle, this should address Peter's concern about the facebook trail-latency workload, as I didn't see any regression in schbench workload 99.0000th latency report. However, I also didn't see any significant benefit so far, probably I should put more load on the system. I'll do more characterization of uperf workload to see if I can find anything. Thanks, -Aubrey