From patchwork Tue Apr 27 02:37:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Song Bao Hua (Barry Song)" X-Patchwork-Id: 12225255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7188EC433B4 for ; Tue, 27 Apr 2021 02:47:41 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D2EFA60E0B for ; Tue, 27 Apr 2021 02:47:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D2EFA60E0B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=hisilicon.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:Subject:CC:To:From: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=/Gfe55BoFD5ptj+bv3NKaj+/FRbDAxHI0EAElCRF01g=; b=h0mjmbcFJqcMDsHzWyisFu0o4M o4WKeLrSt0tCSeN31d1RSCD0L+nhGx7bW3xMfGnFBaEBOp7BCMYynbODDm5YtOoz+jD3PJnsIf8n4 /tCSONZ0mwjkyS0d1DQOdWRkXjj6FDBwl7RzT2+HncxaVAeisopWsMz0T4mr/TTyCnUII2nOanyvB uvBh8KnWB6Fgp8hs0Dx504UbpKcbYAeLwaasztE2hTjg91usbKb3SaqF1Ne1wzI7iYTQmPX/OmaTo SzzSWTKrfqG6mwfRx5PnfNW41wm6TY8bK8QnKFuA6qFyNxRAIzW3klSr04b1OtOzB8hc1uayLWjP+ sC6/qoYg==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lbDjF-000VbC-Km; Tue, 27 Apr 2021 02:45:38 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lbDjB-000Vas-E0 for linux-arm-kernel@desiato.infradead.org; Tue, 27 Apr 2021 02:45:33 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Type: Content-Transfer-Encoding:MIME-Version:Message-ID:Date:Subject:CC:To:From: Sender:Reply-To:Content-ID:Content-Description:In-Reply-To:References; bh=OyIjlD0fV9S+kxj19pAMKVeo3T7FqK9WvP82AnjtjqM=; b=3+PI11UQOyYLQRVCykRvJX0gku bIhmZVg82obJJG/jl5AHWZG8eZSS2k7wZWIoegeKrFwg8llIdrvwVUNMgmuGrQ82Hjz/im/jhjGoX S1kwHzZBaqHJaoukIfXfsFAKByYJHH9iDzcHJv9MQbNKNNGOWGZAc0gNaVSopHXsSEIVVCoH2vULF HXxPeIqhtsydDpCsO7q+BmUIRCYK291/bYgp0tDwUY3F+vt3TTLU5sXdYsYO+treDSH2oKncQPE0I d2StyOvXofHOHAN5uuAGi1niJ6498voYQdPWZw4cwIYH3ri18eW7c+baheyMKOjdQlUzRWi1ewb1g z+iforug==; Received: from szxga04-in.huawei.com ([45.249.212.190]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lbDj7-00GOMY-LV for linux-arm-kernel@lists.infradead.org; Tue, 27 Apr 2021 02:45:32 +0000 Received: from DGGEMS405-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4FTmJ54s1Mzmdsx; Tue, 27 Apr 2021 10:42:13 +0800 (CST) Received: from SWX921481.china.huawei.com (10.126.201.183) by DGGEMS405-HUB.china.huawei.com (10.3.19.205) with Microsoft SMTP Server id 14.3.498.0; Tue, 27 Apr 2021 10:45:09 +0800 From: Barry Song To: , , , , , , CC: , , , , , , , , , , , , Barry Song , "Yongjia Xie" Subject: [PATCH] sched/fair: don't use waker's cpu if the waker of sync wake-up is interrupt Date: Tue, 27 Apr 2021 14:37:58 +1200 Message-ID: <20210427023758.4048-1-song.bao.hua@hisilicon.com> X-Mailer: git-send-email 2.21.0.windows.1 MIME-Version: 1.0 X-Originating-IP: [10.126.201.183] X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210426_194529_887888_DFF90A24 X-CRM114-Status: GOOD ( 11.59 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org a severe qperf performance decrease was reported in the below use case: For a hardware with 2 NUMA nodes, node0 has cpu0-31, node1 has cpu32-63. Ethernet is located in node1. Run the below commands: $ taskset -c 32-63 stress -c 32 & $ qperf 192.168.50.166 tcp_lat tcp_lat: latency = 2.95ms. Normally the latency should be less than 20us. But in the above test, latency increased dramatically to 2.95ms. This is caused by ping-pong of qperf between node0 and node1. Since it is a sync wake-up and waker's nr_running == 1, WAKE_AFFINE will pull qperf to node1, but LB will soon migrate qperf back to node0. Not like a normal sync wake-up coming from a task, the waker in the above test is an interrupt and nr_running happens to be 1 since stress starts 32 threads on node1 with 32 cpus. Testing also shows the performance of qperf won't drop if the number of threads are increased to 64, 96 or larger values: $ taskset -c 32-63 stress -c 96 & $ qperf 192.168.50.166 tcp_lat tcp_lat: latency = 14.7us. Obviously "-c 96" makes "cpu_rq(this_cpu)->nr_running == 1" false in wake_affine_idle() so WAKE_AFFINE won't pull qperf to node1. To fix this issue, this patch checks the waker of sync wake-up is a task but not an interrupt. In this case, the waker will schedule out and give CPU to wakee. Reported-by: Yongjia Xie Signed-off-by: Barry Song --- kernel/sched/fair.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6d73bdbb2d40..8ad2d732033d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5829,7 +5829,12 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync) if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu)) return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu; - if (sync && cpu_rq(this_cpu)->nr_running == 1) + /* + * If this is a sync wake-up and the only running thread is just + * waker, thus, waker is not interrupt, we assume wakee will get + * the cpu of waker soon + */ + if (sync && cpu_rq(this_cpu)->nr_running == 1 && in_task()) return this_cpu; if (available_idle_cpu(prev_cpu))