From patchwork Fri Jun 21 13:53:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Wagner X-Patchwork-Id: 13707628 Received: from mail.nearlyone.de (mail.nearlyone.de [49.12.199.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0128DEAE6; Fri, 21 Jun 2024 13:53:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=49.12.199.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718978009; cv=none; b=s0tCAKVN8fadHf71zdd9nZwVdtWfFqnMFJgrDm6slVuPxvvIyCSz86keLVUmPYWFdVKsuGOyfc1pnPvENbd8e6/TkW7v0JjcxyDD14hcam93aKiqsxhX9tQWBgYPqQLImCAftnp5KUjj3wsAcCqd3u1asODLfTY9dAKYDWigsjs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718978009; c=relaxed/simple; bh=R6ZXMQLW2392mw1CWus8FWrfEB8kPFv8wxHZAWrIAUE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=NHGuIv+eUrQbAk0SXkegfG3qwyV05DoqWSbSxoTmqt5T7tdLpKaTIS2vxSKSaQcO6PpRasH0Gfc0EXbqtQDnti8Ic4+M3qX8mGA8Q7KNjn6eYYGA84sWNgMI89/4MEV1xUmv1MZD6EF9JbkhH6fOqkZaAMzBXYeYegdS5WQRUkY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=monom.org; dkim=pass (2048-bit key) header.d=monom.org header.i=@monom.org header.b=wK55wQp2; arc=none smtp.client-ip=49.12.199.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=monom.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=monom.org header.i=@monom.org header.b="wK55wQp2" Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id 6A9BEDACCF; Fri, 21 Jun 2024 15:53:18 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monom.org; s=dkim; t=1718977998; h=from:subject:date:message-id:to:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references; bh=C6O9bd6qH5NJhHxs+s6bVDs7oMyZxsTsvkh10LQsbwo=; b=wK55wQp2bh6vEvia14MR8WNy5Sb3v3eUhWBkstK0zV+5PQk11ucrBzNqP9hX+ZZ6obZMkL 3TkJAHBPknD/3rGcIKmlS1/y0psXMqN8xHOM99ybENVpCpLrlL0ivlsck2qGz2FD4mWSos iXIAPvMW20+4YOVhtKEJxjcTinX1KuXwdWyEEQHqnPPcg5DAWOatgpKLHchV2pK8deAENO UsjyR7G3d0buTDXvovkg6UM6yMpVrnJO0hmZEfDX9jRk6cisL/4tOvAhlSBB/daWAKn/6n 04Y36RIehL+P5D9otiSe1EVnSUL9sDTzF1cHhyviYSz8+AD3iftynNLDaNKqrA== From: Daniel Wagner Date: Fri, 21 Jun 2024 15:53:10 +0200 Subject: [PATCH 1/3] sched/isolation: Add io_queue housekeeping option Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240621-isolcpus-io-queues-v1-1-8b169bf41083@suse.de> References: <20240621-isolcpus-io-queues-v1-0-8b169bf41083@suse.de> In-Reply-To: <20240621-isolcpus-io-queues-v1-0-8b169bf41083@suse.de> To: Jens Axboe , Keith Busch , Sagi Grimberg , Thomas Gleixner , Christoph Hellwig Cc: Frederic Weisbecker , Mel Gorman , Hannes Reinecke , Sridhar Balaraman , "brookxu.cn" , Ming Lei , linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Daniel Wagner X-Mailer: b4 0.13.0 X-Last-TLS-Session-Version: TLSv1.3 Drivers such as nvme-pci are spreading the IO queues on all CPUs. This is not necessary an invalid setup when using isolcpus but there are also users of isolcpus which do not want to have any IO workload on the isolated CPUs. Add a new house keeping type so the infrastructure code and drivers are able to distinguish between the two setups. Signed-off-by: Daniel Wagner --- include/linux/sched/isolation.h | 1 + kernel/sched/isolation.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h index 2b461129d1fa..fe751d704e99 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -16,6 +16,7 @@ enum hk_type { HK_TYPE_WQ, HK_TYPE_MANAGED_IRQ, HK_TYPE_KTHREAD, + HK_TYPE_IO_QUEUE, HK_TYPE_MAX }; diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 5891e715f00d..91d7a434330c 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -18,6 +18,7 @@ enum hk_flags { HK_FLAG_WQ = BIT(HK_TYPE_WQ), HK_FLAG_MANAGED_IRQ = BIT(HK_TYPE_MANAGED_IRQ), HK_FLAG_KTHREAD = BIT(HK_TYPE_KTHREAD), + HK_FLAG_IO_QUEUE = BIT(HK_TYPE_IO_QUEUE), }; DEFINE_STATIC_KEY_FALSE(housekeeping_overridden); @@ -228,6 +229,12 @@ static int __init housekeeping_isolcpus_setup(char *str) continue; } + if (!strncmp(str, "io_queue,", 9)) { + str += 9; + flags |= HK_FLAG_IO_QUEUE; + continue; + } + /* * Skip unknown sub-parameter and validate that it is not * containing an invalid character. From patchwork Fri Jun 21 13:53:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Wagner X-Patchwork-Id: 13707629 Received: from mail.nearlyone.de (mail.nearlyone.de [49.12.199.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBE29F4FA; Fri, 21 Jun 2024 13:53:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=49.12.199.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718978009; cv=none; b=gr+TLD6oTOa1juZ3sOW6CImhx6vFonZk73n8Uf6dCb1tliaICRC3rSdHt7g8XtK/NMQODfTip1lOiCjVjJ6mjaEN7wdNaPmlSqnwk2BaAr2iKE7FFJpc38Fif/Fom7Wq20IXZYR2TdWd/KyQf8fzHN1nVWJ/V3hh3nbdgu9rXyU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718978009; c=relaxed/simple; bh=qDiLvHdoOU0W2ytE+6PWSbF7P4lP91TjWGbwJdUM8Fk=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=nMdAlmWrvsAXrOnzpGVljB6tp5bQYyrw3sDZd9Ke+T2JaQnAdSDQyRXszscryMpspt8V1avin25tvuNOxnIJYHTcwswQxuY3JujSqBq/s/VGkUrVgfE8FEM65566fdlA0RgXG8Orf7vvSEwb57bYCmX7TjsTDeaqrMnqhrqQ4yk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=monom.org; dkim=pass (2048-bit key) header.d=monom.org header.i=@monom.org header.b=izHBjOV9; arc=none smtp.client-ip=49.12.199.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=monom.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=monom.org header.i=@monom.org header.b="izHBjOV9" Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id 9AFCADAD7F; Fri, 21 Jun 2024 15:53:19 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monom.org; s=dkim; t=1718978000; h=from:subject:date:message-id:to:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references; bh=9vgTbawfVJRZg+Tr2xuaEIBsOwHDrviZ04LuIhBEo38=; b=izHBjOV9qzNUrHn1QTXCahMG2sZ+67sCro2anSHBwsVzasus1sp9ttWaDvtQkaXk9uausg ZcPb/oM89pXp2lP16ZOcHV6Hz3+QJY9QU2EssjEHZi/VAsI89J8Sv5VeQFVvPrRqs/GrmW bNI0tGVRfQlTQSM+nwro96/1P+jlJeNmATvdb1DKOCQ8/kHlznUiot5VDFN8l0MCbu+18Q yB2F+ZuEma1OX40J341Mlcwdyb6tazyuePY6JQP6wvkx4n1u6OeJ6xq7brgkUWHdqZkHpv xiSyalgXZsKST/RPYtFSuXYQx5Ho91JBz4kMBlnvvgtEqJrlrbcfH5NcblpQ8w== From: Daniel Wagner Date: Fri, 21 Jun 2024 15:53:11 +0200 Subject: [PATCH 2/3] nvme-pci: limit queue count to housekeeping cpus Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240621-isolcpus-io-queues-v1-2-8b169bf41083@suse.de> References: <20240621-isolcpus-io-queues-v1-0-8b169bf41083@suse.de> In-Reply-To: <20240621-isolcpus-io-queues-v1-0-8b169bf41083@suse.de> To: Jens Axboe , Keith Busch , Sagi Grimberg , Thomas Gleixner , Christoph Hellwig Cc: Frederic Weisbecker , Mel Gorman , Hannes Reinecke , Sridhar Balaraman , "brookxu.cn" , Ming Lei , linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Daniel Wagner X-Mailer: b4 0.13.0 X-Last-TLS-Session-Version: TLSv1.3 The nvme-pci driver is ignoring the isolcpus and allocates IO queues for all possible CPUs. But this could add noise to the isolated CPU whenever there is IO issued on the isolated CPU. This is not always what the user wants. Thus only ask for as many queues as there are housekeeping CPUs. Note, the placement of the queues will be addressed in the next patch. Signed-off-by: Daniel Wagner --- block/blk-mq-cpumap.c | 13 +++++++++++++ drivers/nvme/host/pci.c | 4 ++-- include/linux/blk-mq.h | 1 + 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 9638b25fd521..43c039900ef6 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -11,10 +11,23 @@ #include #include #include +#include #include "blk.h" #include "blk-mq.h" +unsigned int blk_mq_num_possible_queues(void) +{ + const struct cpumask *io_queue_mask; + + io_queue_mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE); + if (!cpumask_empty(io_queue_mask)) + return cpumask_weight(io_queue_mask); + + return num_possible_cpus(); +} +EXPORT_SYMBOL_GPL(blk_mq_num_possible_queues); + void blk_mq_map_queues(struct blk_mq_queue_map *qmap) { const struct cpumask *masks; diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 102a9fb0c65f..66999fa13b2c 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -81,7 +81,7 @@ static int io_queue_count_set(const char *val, const struct kernel_param *kp) int ret; ret = kstrtouint(val, 10, &n); - if (ret != 0 || n > num_possible_cpus()) + if (ret != 0 || n > blk_mq_num_possible_queues()) return -EINVAL; return param_set_uint(val, kp); } @@ -2263,7 +2263,7 @@ static unsigned int nvme_max_io_queues(struct nvme_dev *dev) */ if (dev->ctrl.quirks & NVME_QUIRK_SHARED_TAGS) return 1; - return num_possible_cpus() + dev->nr_write_queues + dev->nr_poll_queues; + return blk_mq_num_possible_queues() + dev->nr_write_queues + dev->nr_poll_queues; } static int nvme_setup_io_queues(struct nvme_dev *dev) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 89ba6b16fe8b..2105cc78ca67 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -900,6 +900,7 @@ void blk_mq_freeze_queue_wait(struct request_queue *q); int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, unsigned long timeout); +unsigned int blk_mq_num_possible_queues(void); void blk_mq_map_queues(struct blk_mq_queue_map *qmap); void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues); From patchwork Fri Jun 21 13:53:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Wagner X-Patchwork-Id: 13707630 Received: from mail.nearlyone.de (mail.nearlyone.de [49.12.199.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DEB3E171AB; Fri, 21 Jun 2024 13:53:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=49.12.199.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718978011; cv=none; b=mKGJSCw+2FYK4BQzqirZDqRIijZmx4Iw69qBUPyBvUMtxS+MeNSXzapC1oRHX/5M/zz94ZgqPbpM2l3BhgedXBZ6PfNcuCDRywPiBZk1hXh51KA193BSpB/wiOcWkf//JkV04kzgNj+6OSxeiHh5XEFvu2ACGl/2KP/Wz4L7PyE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718978011; c=relaxed/simple; bh=3ubbHzxDYLS7GUt9vO4bLHGBxIHFTHvMWZGjsN4/xPM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=TmFVc1POhafM1isy73bEIrkP9XZQNSQK/GyI6dvqckMmPVDpG1TknV+9M6WBFSRJF8dqz2BjfoGgQ/GuaqU2VSWSyqaSkm/WqLrd4TD+6hAtA6tuL9qyS0YLDZ6XFDAjGp6VkNgN+6h9YvTkxMbT4b78a1eAQ/5fi/fZb1zn1Yc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=monom.org; dkim=pass (2048-bit key) header.d=monom.org header.i=@monom.org header.b=htY7I9hc; arc=none smtp.client-ip=49.12.199.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=monom.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=monom.org header.i=@monom.org header.b="htY7I9hc" Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id D7984DAD81; Fri, 21 Jun 2024 15:53:20 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monom.org; s=dkim; t=1718978001; h=from:subject:date:message-id:to:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references; bh=k5mhCBQQFyo/pEHelnW5w0jjgQcj3b/N98E7vKt3bO0=; b=htY7I9hcSpMywnTn+OdfkPekbPSrDaFGG0Bk8oZvxcOj20UnFyizWaZbbvX6l30hPMhn4+ sS8V6IuJ31CF+3yVpZULD9v8xe5n2/l5fiSWlUmmgc85BQpraQA3/1K/FZO/pfCTU+cE/5 5JSaeKfljBcxXAW1NhYX1/ih6CJSc4skiphQJpWWTHjcTbKGY6N2kGaulSpejFH368lEua 70gwOBwGSJtrZp6t0Ccut8BPHOX9v/c2WMCM5cOPl01F18aV81g+B8y4tSkn+1EN+yz4Pf J0P1KlokKjtL0yZE5g0ZyGYClpHHHL+tP4et8n60qp1+9TnNCBJo36Aj8fY5KA== From: Daniel Wagner Date: Fri, 21 Jun 2024 15:53:12 +0200 Subject: [PATCH 3/3] lib/group_cpus.c: honor housekeeping config when grouping CPUs Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240621-isolcpus-io-queues-v1-3-8b169bf41083@suse.de> References: <20240621-isolcpus-io-queues-v1-0-8b169bf41083@suse.de> In-Reply-To: <20240621-isolcpus-io-queues-v1-0-8b169bf41083@suse.de> To: Jens Axboe , Keith Busch , Sagi Grimberg , Thomas Gleixner , Christoph Hellwig Cc: Frederic Weisbecker , Mel Gorman , Hannes Reinecke , Sridhar Balaraman , "brookxu.cn" , Ming Lei , linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Daniel Wagner X-Mailer: b4 0.13.0 X-Last-TLS-Session-Version: TLSv1.3 group_cpus_evenly distributes all present CPUs into groups. This ignores the isolcpus configuration and assigns isolated CPUs into the groups. Make group_cpus_evenly aware of isolcpus configuration and use the housekeeping CPU mask as base for distributing the available CPUs into groups. Signed-off-by: Daniel Wagner --- lib/group_cpus.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 62 insertions(+), 2 deletions(-) diff --git a/lib/group_cpus.c b/lib/group_cpus.c index ee272c4cefcc..f1517a44abc9 100644 --- a/lib/group_cpus.c +++ b/lib/group_cpus.c @@ -8,6 +8,7 @@ #include #include #include +#include #ifdef CONFIG_SMP @@ -330,7 +331,7 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps, } /** - * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality + * group_possible_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality * @numgrps: number of groups * * Return: cpumask array if successful, NULL otherwise. And each element @@ -344,7 +345,7 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps, * We guarantee in the resulted grouping that all CPUs are covered, and * no same CPU is assigned to multiple groups */ -struct cpumask *group_cpus_evenly(unsigned int numgrps) +static struct cpumask *group_possible_cpus_evenly(unsigned int numgrps) { unsigned int curgrp = 0, nr_present = 0, nr_others = 0; cpumask_var_t *node_to_cpumask; @@ -423,6 +424,65 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps) } return masks; } + +static struct cpumask *group_mask_cpus_evenly(unsigned int numgrps, + const struct cpumask *cpu_mask) +{ + cpumask_var_t *node_to_cpumask; + cpumask_var_t nmsk; + int ret = -ENOMEM; + struct cpumask *masks = NULL; + + if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL)) + return NULL; + + node_to_cpumask = alloc_node_to_cpumask(); + if (!node_to_cpumask) + goto fail_nmsk; + + masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL); + if (!masks) + goto fail_node_to_cpumask; + + build_node_to_cpumask(node_to_cpumask); + + ret = __group_cpus_evenly(0, numgrps, node_to_cpumask, cpu_mask, nmsk, + masks); + +fail_node_to_cpumask: + free_node_to_cpumask(node_to_cpumask); + +fail_nmsk: + free_cpumask_var(nmsk); + if (ret < 0) { + kfree(masks); + return NULL; + } + return masks; +} + +/** + * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality + * @numgrps: number of groups + * + * Return: cpumask array if successful, NULL otherwise. And each element + * includes CPUs assigned to this group + * + * @group_possible_cpus_evently is used for distributing the cpus + * on all possible cpus in absence of isolcpus command line argument. + * If the isolcpus argument is used with io_queue option only + * the housekeeping CPUs are considered. + */ +struct cpumask *group_cpus_evenly(unsigned int numgrps) +{ + const struct cpumask *io_queue_mask; + + io_queue_mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE); + if (!cpumask_empty(io_queue_mask)) + return group_mask_cpus_evenly(numgrps, io_queue_mask); + else + return group_possible_cpus_evenly(numgrps); +} #else /* CONFIG_SMP */ struct cpumask *group_cpus_evenly(unsigned int numgrps) {