[RFC,0/4] Reduce worst-case scanning of runqueues in select_idle_sibling

Message ID	20201207091516.24683-1-mgorman@techsingularity.net (mailing list archive)
Headers	show Return-Path: <SRS0=s/rV=FL=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1546422A85 From: Mel Gorman <mgorman@techsingularity.net> To: LKML <linux-kernel@vger.kernel.org> Subject: [RFC PATCH 0/4] Reduce worst-case scanning of runqueues in select_idle_sibling Date: Mon, 7 Dec 2020 09:15:12 +0000 Message-Id: <20201207091516.24683-1-mgorman@techsingularity.net> MIME-Version: 1.0 Precedence: list Cc: Barry Song <song.bao.hua@hisilicon.com>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Peter Ziljstra <peterz@infradead.org>, Aubrey Li <aubrey.li@linux.intel.com>, Ingo Molnar <mingo@redhat.com>, Mel Gorman <mgorman@techsingularity.net>, Valentin Schneider <valentin.schneider@arm.com>, Linux-ARM <linux-arm-kernel@lists.infradead.org> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
Series	Reduce worst-case scanning of runqueues in select_idle_sibling \| expand [RFC,0/4] Reduce worst-case scanning of runqueues in select_idle_sibling [1/4] sched/fair: Remove SIS_AVG_CPU [2/4] sched/fair: Do not replace recent_used_cpu with the new target [3/4] sched/fair: Return an idle cpu if one is found after a failed search for an idle core [4/4] sched/fair: Avoid revisiting CPUs multiple times during select_idle_sibling

Message ID

20201207091516.24683-1-mgorman@techsingularity.net (mailing list archive)

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1546422A85
From: Mel Gorman <mgorman@techsingularity.net>
To: LKML <linux-kernel@vger.kernel.org>
Subject: [RFC PATCH 0/4] Reduce worst-case scanning of runqueues in
 select_idle_sibling
Date: Mon,  7 Dec 2020 09:15:12 +0000
Message-Id: <20201207091516.24683-1-mgorman@techsingularity.net>
MIME-Version: 1.0
Precedence: list
Cc: Barry Song <song.bao.hua@hisilicon.com>,
 Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 Peter Ziljstra <peterz@infradead.org>, Aubrey Li <aubrey.li@linux.intel.com>,
 Ingo Molnar <mingo@redhat.com>, Mel Gorman <mgorman@techsingularity.net>,
 Valentin Schneider <valentin.schneider@arm.com>,
 Linux-ARM <linux-arm-kernel@lists.infradead.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: 
 linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

Series

Reduce worst-case scanning of runqueues in select_idle_sibling | expand

Message

Mel Gorman Dec. 7, 2020, 9:15 a.m. UTC

This is a minimal series to reduce the amount of runqueue scanning in
select_idle_sibling in the worst case.

Patch 1 removes SIS_AVG_CPU because it's unused.

Patch 2 improves the hit rate of p->recent_used_cpu to reduce the amount
	of scanning. It should be relatively uncontroversial

Patch 3-4 scans the runqueues in a single pass for select_idle_core()
	and select_idle_cpu() so runqueues are not scanned twice. It's
	a tradeoff because it benefits deep scans but introduces overhead
	for shallow scans.

Even if patch 3-4 is rejected to allow more time for Aubrey's idle cpu mask
approach to stand on its own, patches 1-2 should be fine. The main decision
with patch 4 is whether select_idle_core() should do a full scan when searching
for an idle core, whether it should be throttled in some other fashion or
whether it should be just left alone.

Comments

Vincent Guittot Dec. 7, 2020, 3:04 p.m. UTC | #1

On Mon, 7 Dec 2020 at 10:15, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> This is a minimal series to reduce the amount of runqueue scanning in
> select_idle_sibling in the worst case.
>
> Patch 1 removes SIS_AVG_CPU because it's unused.
>
> Patch 2 improves the hit rate of p->recent_used_cpu to reduce the amount
>         of scanning. It should be relatively uncontroversial
>
> Patch 3-4 scans the runqueues in a single pass for select_idle_core()
>         and select_idle_cpu() so runqueues are not scanned twice. It's
>         a tradeoff because it benefits deep scans but introduces overhead
>         for shallow scans.
>
> Even if patch 3-4 is rejected to allow more time for Aubrey's idle cpu mask

patch 3 looks fine and doesn't collide with Aubrey's work. But I don't
like patch 4  which manipulates different cpumask including
load_balance_mask out of LB and I prefer to wait for v6 of Aubrey's
patchset which should fix the problem of possibly  scanning twice busy
cpus in select_idle_core and select_idle_cpu



> approach to stand on its own, patches 1-2 should be fine. The main decision
> with patch 4 is whether select_idle_core() should do a full scan when searching
> for an idle core, whether it should be throttled in some other fashion or
> whether it should be just left alone.
>
> --
> 2.26.2
>

Mel Gorman Dec. 7, 2020, 3:42 p.m. UTC | #2

On Mon, Dec 07, 2020 at 04:04:41PM +0100, Vincent Guittot wrote:
> On Mon, 7 Dec 2020 at 10:15, Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > This is a minimal series to reduce the amount of runqueue scanning in
> > select_idle_sibling in the worst case.
> >
> > Patch 1 removes SIS_AVG_CPU because it's unused.
> >
> > Patch 2 improves the hit rate of p->recent_used_cpu to reduce the amount
> >         of scanning. It should be relatively uncontroversial
> >
> > Patch 3-4 scans the runqueues in a single pass for select_idle_core()
> >         and select_idle_cpu() so runqueues are not scanned twice. It's
> >         a tradeoff because it benefits deep scans but introduces overhead
> >         for shallow scans.
> >
> > Even if patch 3-4 is rejected to allow more time for Aubrey's idle cpu mask
> 
> patch 3 looks fine and doesn't collide with Aubrey's work. But I don't
> like patch 4  which manipulates different cpumask including
> load_balance_mask out of LB and I prefer to wait for v6 of Aubrey's
> patchset which should fix the problem of possibly  scanning twice busy
> cpus in select_idle_core and select_idle_cpu
> 

Seems fair, we can see where we stand after V6 of Aubrey's work.  A lot
of the motivation for patch 4 would go away if we managed to avoid calling
select_idle_core() unnecessarily. As it stands, we can call it a lot from
hackbench even though the chance of getting an idle core are minimal.

Assuming I revisit it, I'll update the schedstat debug patches to include
the times select_idle_core() starts versus how many times it fails and
see can I think of a useful heuristic.

I'll wait for more review on patches 1-3 and if I hear nothing, I'll
resend just those.

Thanks Vincent.

Aubrey Li Dec. 8, 2020, 2:06 a.m. UTC | #3

On 2020/12/7 23:42, Mel Gorman wrote:
> On Mon, Dec 07, 2020 at 04:04:41PM +0100, Vincent Guittot wrote:
>> On Mon, 7 Dec 2020 at 10:15, Mel Gorman <mgorman@techsingularity.net> wrote:
>>>
>>> This is a minimal series to reduce the amount of runqueue scanning in
>>> select_idle_sibling in the worst case.
>>>
>>> Patch 1 removes SIS_AVG_CPU because it's unused.
>>>
>>> Patch 2 improves the hit rate of p->recent_used_cpu to reduce the amount
>>>         of scanning. It should be relatively uncontroversial
>>>
>>> Patch 3-4 scans the runqueues in a single pass for select_idle_core()
>>>         and select_idle_cpu() so runqueues are not scanned twice. It's
>>>         a tradeoff because it benefits deep scans but introduces overhead
>>>         for shallow scans.
>>>
>>> Even if patch 3-4 is rejected to allow more time for Aubrey's idle cpu mask
>>
>> patch 3 looks fine and doesn't collide with Aubrey's work. But I don't
>> like patch 4  which manipulates different cpumask including
>> load_balance_mask out of LB and I prefer to wait for v6 of Aubrey's
>> patchset which should fix the problem of possibly  scanning twice busy
>> cpus in select_idle_core and select_idle_cpu
>>
> 
> Seems fair, we can see where we stand after V6 of Aubrey's work.  A lot
> of the motivation for patch 4 would go away if we managed to avoid calling
> select_idle_core() unnecessarily. As it stands, we can call it a lot from
> hackbench even though the chance of getting an idle core are minimal.
> 

Sorry for the delay, I sent v6 out just now. Comparing to v5, v6 followed Vincent's
suggestion to decouple idle cpumask update from stop_tick signal, that is, the
CPU is set in idle cpumask every time the CPU enters idle, this should address
Peter's concern about the facebook trail-latency workload, as I didn't see
any regression in schbench workload 99.0000th latency report.

However, I also didn't see any significant benefit so far, probably I should
put more load on the system. I'll do more characterization of uperf workload
to see if I can find anything.

Thanks,
-Aubrey