Message ID | 20200929091358.421086-2-leon@kernel.org (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | Delete the get_vector_affinity leftovers | expand |
On Tue, Sep 29, 2020 at 12:13:57PM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@nvidia.com> > > The RDMA vector affinity code is not backed up by any driver and always > returns NULL to every ib_get_vector_affinity() call. > > This means that blk_mq_rdma_map_queues() always takes fallback path. > > Fixes: 9afc97c29b03 ("mlx5: remove support for ib_get_vector_affinity") > Signed-off-by: Leon Romanovsky <leonro@nvidia.com> So you guys totally broken the nvme queue assignment without even telling anyone? Great job!
On Tue, Sep 29, 2020 at 12:20:46PM +0200, Christoph Hellwig wrote: > On Tue, Sep 29, 2020 at 12:13:57PM +0300, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@nvidia.com> > > > > The RDMA vector affinity code is not backed up by any driver and always > > returns NULL to every ib_get_vector_affinity() call. > > > > This means that blk_mq_rdma_map_queues() always takes fallback path. > > > > Fixes: 9afc97c29b03 ("mlx5: remove support for ib_get_vector_affinity") > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com> > > So you guys totally broken the nvme queue assignment without even > telling anyone? Great job! Who is "you guys" and it wasn't silent either? I'm sure that Sagi knows the craft. https://lore.kernel.org/linux-rdma/20181224221606.GA25780@ziepe.ca/ commit 759ace7832802eaefbca821b2b43a44ab896b449 Author: Sagi Grimberg <sagi@grimberg.me> Date: Thu Nov 1 13:08:07 2018 -0700 i40iw: remove support for ib_get_vector_affinity .... commit 9afc97c29b032af9a4112c2f4a02d5313b4dc71f Author: Sagi Grimberg <sagi@grimberg.me> Date: Thu Nov 1 09:13:12 2018 -0700 mlx5: remove support for ib_get_vector_affinity Thanks
>>> From: Leon Romanovsky <leonro@nvidia.com> >>> >>> The RDMA vector affinity code is not backed up by any driver and always >>> returns NULL to every ib_get_vector_affinity() call. >>> >>> This means that blk_mq_rdma_map_queues() always takes fallback path. >>> >>> Fixes: 9afc97c29b03 ("mlx5: remove support for ib_get_vector_affinity") >>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> >> >> So you guys totally broken the nvme queue assignment without even >> telling anyone? Great job! > > Who is "you guys" and it wasn't silent either? I'm sure that Sagi knows the craft. > https://lore.kernel.org/linux-rdma/20181224221606.GA25780@ziepe.ca/ > > commit 759ace7832802eaefbca821b2b43a44ab896b449 > Author: Sagi Grimberg <sagi@grimberg.me> > Date: Thu Nov 1 13:08:07 2018 -0700 > > i40iw: remove support for ib_get_vector_affinity > > .... > > commit 9afc97c29b032af9a4112c2f4a02d5313b4dc71f > Author: Sagi Grimberg <sagi@grimberg.me> > Date: Thu Nov 1 09:13:12 2018 -0700 > > mlx5: remove support for ib_get_vector_affinity > > Thanks Yes, basically usage of managed affinity caused people to report regressions not being able to change irq affinity from procfs. Back then I started a discussion with Thomas to make managed affinity to still allow userspace to modify this, but this was dropped at some point. So currently rdma cannot do automatic irq affinitization out of the box.
On Tue, Sep 29, 2020 at 11:24:49AM -0700, Sagi Grimberg wrote: > Yes, basically usage of managed affinity caused people to report > regressions not being able to change irq affinity from procfs. Well, why would they change it? The whole point of the infrastructure is that there is a single sane affinity setting for a given setup. Now that setting needed some refinement from the original series (e.g. the current series about only using housekeeping cpus if cpu isolation is in use). But allowing random users to modify affinity is just a receipe for a trainwreck. So I think we need to bring this back ASAP, as doing affinity right out of the box is an absolute requirement for sane performance without all the benchmarketing deep magic.
>> Yes, basically usage of managed affinity caused people to report >> regressions not being able to change irq affinity from procfs. > > Well, why would they change it? The whole point of the infrastructure > is that there is a single sane affinity setting for a given setup. Now > that setting needed some refinement from the original series (e.g. the > current series about only using housekeeping cpus if cpu isolation is > in use). But allowing random users to modify affinity is just a receipe > for a trainwreck. Well allowing people to mangle irq affinity settings seem to be a hard requirement from the discussions in the past. > So I think we need to bring this back ASAP, as doing affinity right > out of the box is an absolute requirement for sane performance without > all the benchmarketing deep magic. Well, it's hard to say that setting custom irq affinity settings is deemed non-useful to anyone and hence should be prevented. I'd expect that irq settings have a sane default that works and if someone wants to change it, it can but there should be no guarantees on optimal performance. But IIRC this had some dependencies on drivers and some more infrastructure to handle dynamic changes...
On Fri, Oct 02, 2020 at 01:20:35PM -0700, Sagi Grimberg wrote: >> Well, why would they change it? The whole point of the infrastructure >> is that there is a single sane affinity setting for a given setup. Now >> that setting needed some refinement from the original series (e.g. the >> current series about only using housekeeping cpus if cpu isolation is >> in use). But allowing random users to modify affinity is just a receipe >> for a trainwreck. > > Well allowing people to mangle irq affinity settings seem to be a hard > requirement from the discussions in the past. > >> So I think we need to bring this back ASAP, as doing affinity right >> out of the box is an absolute requirement for sane performance without >> all the benchmarketing deep magic. > > Well, it's hard to say that setting custom irq affinity settings is > deemed non-useful to anyone and hence should be prevented. I'd expect > that irq settings have a sane default that works and if someone wants to > change it, it can but there should be no guarantees on optimal > performance. But IIRC this had some dependencies on drivers and some > more infrastructure to handle dynamic changes... The problem is that people change random settings. We need to generalize it into a sane API (e.g. the housekeeping CPUs thing which totally makes sense).
On Mon, Oct 05, 2020 at 10:38:17AM +0200, Christoph Hellwig wrote: > On Fri, Oct 02, 2020 at 01:20:35PM -0700, Sagi Grimberg wrote: > >> Well, why would they change it? The whole point of the infrastructure > >> is that there is a single sane affinity setting for a given setup. Now > >> that setting needed some refinement from the original series (e.g. the > >> current series about only using housekeeping cpus if cpu isolation is > >> in use). But allowing random users to modify affinity is just a receipe > >> for a trainwreck. > > > > Well allowing people to mangle irq affinity settings seem to be a hard > > requirement from the discussions in the past. > > > >> So I think we need to bring this back ASAP, as doing affinity right > >> out of the box is an absolute requirement for sane performance without > >> all the benchmarketing deep magic. > > > > Well, it's hard to say that setting custom irq affinity settings is > > deemed non-useful to anyone and hence should be prevented. I'd expect > > that irq settings have a sane default that works and if someone wants to > > change it, it can but there should be no guarantees on optimal > > performance. But IIRC this had some dependencies on drivers and some > > more infrastructure to handle dynamic changes... > > The problem is that people change random settings. We need to generalize > it into a sane API (e.g. the housekeeping CPUs thing which totally makes > sense). I don't see many people jump on the bandwagon, someone should do it, but who will? I personally have no knowledge in that area to do anything meaningful. Thanks
diff --git a/block/Kconfig b/block/Kconfig index bbad5e8bbffe..8ede308a1343 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -227,11 +227,6 @@ config BLK_MQ_VIRTIO depends on BLOCK && VIRTIO default y -config BLK_MQ_RDMA - bool - depends on BLOCK && INFINIBAND - default y - config BLK_PM def_bool BLOCK && PM diff --git a/block/Makefile b/block/Makefile index 8d841f5f986f..bbdc3e82308a 100644 --- a/block/Makefile +++ b/block/Makefile @@ -29,7 +29,6 @@ obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o obj-$(CONFIG_BLK_DEV_INTEGRITY_T10) += t10-pi.o obj-$(CONFIG_BLK_MQ_PCI) += blk-mq-pci.o obj-$(CONFIG_BLK_MQ_VIRTIO) += blk-mq-virtio.o -obj-$(CONFIG_BLK_MQ_RDMA) += blk-mq-rdma.o obj-$(CONFIG_BLK_DEV_ZONED) += blk-zoned.o obj-$(CONFIG_BLK_WBT) += blk-wbt.o obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c deleted file mode 100644 index 14f968e58b8f..000000000000 --- a/block/blk-mq-rdma.c +++ /dev/null @@ -1,44 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Copyright (c) 2017 Sagi Grimberg. - */ -#include <linux/blk-mq.h> -#include <linux/blk-mq-rdma.h> -#include <rdma/ib_verbs.h> - -/** - * blk_mq_rdma_map_queues - provide a default queue mapping for rdma device - * @map: CPU to hardware queue map. - * @dev: rdma device to provide a mapping for. - * @first_vec: first interrupt vectors to use for queues (usually 0) - * - * This function assumes the rdma device @dev has at least as many available - * interrupt vetors as @set has queues. It will then query it's affinity mask - * and built queue mapping that maps a queue to the CPUs that have irq affinity - * for the corresponding vector. - * - * In case either the driver passed a @dev with less vectors than - * @set->nr_hw_queues, or @dev does not provide an affinity mask for a - * vector, we fallback to the naive mapping. - */ -int blk_mq_rdma_map_queues(struct blk_mq_queue_map *map, - struct ib_device *dev, int first_vec) -{ - const struct cpumask *mask; - unsigned int queue, cpu; - - for (queue = 0; queue < map->nr_queues; queue++) { - mask = ib_get_vector_affinity(dev, first_vec + queue); - if (!mask) - goto fallback; - - for_each_cpu(cpu, mask) - map->mq_map[cpu] = map->queue_offset + queue; - } - - return 0; - -fallback: - return blk_mq_map_queues(map); -} -EXPORT_SYMBOL_GPL(blk_mq_rdma_map_queues); diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 9e378d0a0c01..5989d4e35ef3 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -12,7 +12,6 @@ #include <linux/string.h> #include <linux/atomic.h> #include <linux/blk-mq.h> -#include <linux/blk-mq-rdma.h> #include <linux/types.h> #include <linux/list.h> #include <linux/mutex.h> @@ -2171,10 +2170,8 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set) ctrl->io_queues[HCTX_TYPE_DEFAULT]; set->map[HCTX_TYPE_READ].queue_offset = 0; } - blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_DEFAULT], - ctrl->device->dev, 0); - blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_READ], - ctrl->device->dev, 0); + blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]); + blk_mq_map_queues(&set->map[HCTX_TYPE_READ]); if (opts->nr_poll_queues && ctrl->io_queues[HCTX_TYPE_POLL]) { /* map dedicated poll queues only if we have queues left */ diff --git a/include/linux/blk-mq-rdma.h b/include/linux/blk-mq-rdma.h deleted file mode 100644 index 5cc5f0f36218..000000000000 --- a/include/linux/blk-mq-rdma.h +++ /dev/null @@ -1,11 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _LINUX_BLK_MQ_RDMA_H -#define _LINUX_BLK_MQ_RDMA_H - -struct blk_mq_tag_set; -struct ib_device; - -int blk_mq_rdma_map_queues(struct blk_mq_queue_map *map, - struct ib_device *dev, int first_vec); - -#endif /* _LINUX_BLK_MQ_RDMA_H */