From patchwork Wed Jun  6 16:38:46 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jeremy Linton <jeremy.linton@arm.com>
X-Patchwork-Id: 10450643
Return-Path: <linux-acpi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	0EF8F60375 for <patchwork-linux-acpi@patchwork.kernel.org>;
	Wed,  6 Jun 2018 16:38:54 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F215329A32
	for <patchwork-linux-acpi@patchwork.kernel.org>;
	Wed,  6 Jun 2018 16:38:53 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id E6D6929A38; Wed,  6 Jun 2018 16:38:53 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F52B29A32
	for <patchwork-linux-acpi@patchwork.kernel.org>;
	Wed,  6 Jun 2018 16:38:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S933032AbeFFQiw (ORCPT
	<rfc822;patchwork-linux-acpi@patchwork.kernel.org>);
	Wed, 6 Jun 2018 12:38:52 -0400
Received: from foss.arm.com ([217.140.101.70]:42946 "EHLO foss.arm.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S932971AbeFFQiv (ORCPT <rfc822;linux-acpi@vger.kernel.org>);
	Wed, 6 Jun 2018 12:38:51 -0400
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 682DA15AB;
	Wed,  6 Jun 2018 09:38:51 -0700 (PDT)
Received: from beelzebub.austin.arm.com (beelzebub.austin.arm.com
	[10.118.12.119])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id
	DC2EC3F25D; Wed,  6 Jun 2018 09:38:50 -0700 (PDT)
From: Jeremy Linton <jeremy.linton@arm.com>
To: Sudeep.Holla@arm.com
Cc: Will.Deacon@arm.com, Catalin.Marinas@arm.com, Robin.Murphy@arm.com,
	Morten.Rasmussen@arm.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, geert@linux-m68k.org,
	linux-acpi@vger.kernel.org, ard.biesheuvel@linaro.org,
	Jeremy Linton <jeremy.linton@arm.com>
Subject: [PATCH v2] arm64: topology: Avoid checking numa mask for scheduler
	MC selection
Date: Wed,  6 Jun 2018 11:38:46 -0500
Message-Id: <20180606163846.495725-1-jeremy.linton@arm.com>
X-Mailer: git-send-email 2.14.3
Sender: linux-acpi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-acpi.vger.kernel.org>
X-Mailing-List: linux-acpi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

The numa mask subset check can often lead to system hang or crash during
CPU hotplug and system suspend operation if NUMA is disabled. This is
mostly observed on HMP systems where the CPU compute capacities are
different and ends up in different scheduler domains. Since
cpumask_of_node is returned instead core_sibling, the scheduler is
confused with incorrect cpumasks(e.g. one CPU in two different sched
domains at the same time) on CPU hotplug.

Lets disable the NUMA siblings checks for the time being, as NUMA in
socket machines have LLC's that will assure that the scheduler topology
isn't "borken".

The NUMA check exists to assure that if a LLC within a socket crosses
NUMA nodes/chiplets the scheduler domains remain consistent. This code will
likely have to be re-enabled in the near future once the NUMA mask story
is sorted.  At the moment its not necessary because the NUMA in socket
machines LLC's are contained within the NUMA domains.

Further, as a defensive mechanism during hot-plug, lets assure that the
LLC siblings are also masked.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 arch/arm64/kernel/topology.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 7415c166281f..f845a8617812 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -215,13 +215,8 @@ EXPORT_SYMBOL_GPL(cpu_topology);
 
 const struct cpumask *cpu_coregroup_mask(int cpu)
 {
-	const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu));
+	const cpumask_t *core_mask = &cpu_topology[cpu].core_sibling;
 
-	/* Find the smaller of NUMA, core or LLC siblings */
-	if (cpumask_subset(&cpu_topology[cpu].core_sibling, core_mask)) {
-		/* not numa in package, lets use the package siblings */
-		core_mask = &cpu_topology[cpu].core_sibling;
-	}
 	if (cpu_topology[cpu].llc_id != -1) {
 		if (cpumask_subset(&cpu_topology[cpu].llc_siblings, core_mask))
 			core_mask = &cpu_topology[cpu].llc_siblings;
@@ -239,8 +234,10 @@ static void update_siblings_masks(unsigned int cpuid)
 	for_each_possible_cpu(cpu) {
 		cpu_topo = &cpu_topology[cpu];
 
-		if (cpuid_topo->llc_id == cpu_topo->llc_id)
+		if (cpuid_topo->llc_id == cpu_topo->llc_id) {
 			cpumask_set_cpu(cpu, &cpuid_topo->llc_siblings);
+			cpumask_set_cpu(cpuid, &cpu_topo->llc_siblings);
+		}
 
 		if (cpuid_topo->package_id != cpu_topo->package_id)
 			continue;